Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 2.7.5
Affects Version/s: 2.7.1, 2.7.4
Component/s: None
Labels:
None
Environment:
Couchnode 2.3.2/libcouchbase 2.7.4
Couchbase 4.5.1 Cluster
CentOS 7

Description

Before swap/rebalance Couchbase cluster consists of nodes 101 and 105
After swap/rebalance Couchbase cluster consists of nodes 101 and 102

Couchnode 2.3.2 client is running N1QL queries again Couchbase 4.5.1, a two node cluster.
A swap rebalance is done where a node is removed (105), and another is added (102).
Right as the rebalance is finished, a query is done where the index being used was on the node 105.
The connection to 8093 on node 105 fails and causes a cluster map refresh.
A 'Hello Request' is sent to both node 101 and 105 on port 11210.
Both node 105 and 101 respond.
Node 105 responds first and both go through the SASL Auth process.
At this point no more requests are sent to node 101.
Node 105 I assume has shutdown or stopped replying on port 11210 at this point.
Couchnode (or libcouchbase) continues to keep trying node 105 over and over and a new cluster map is never downloaded.

I have included the tcpdump output from this transaction. The first connection reset to node 105 on port 8093 happens at 14:45:42.557965. From there you can follow the events.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dump.pcap
430 kB
24/Apr/17 3:50 PM
run2.removedNodeShutdownAfter.pcap
256 kB
26/Apr/17 11:30 AM
run2.removedNodeShutdownAfter.txt
173 kB
26/Apr/17 11:30 AM
run3.prepared.1_PreparePlan_useIndeNodeThatWillBeRemoved.txt
262 kB
25/Apr/17 4:49 PM
run3.removedNodedontShutdownForFiveMinutes.pcap
265 kB
26/Apr/17 11:30 AM
run3.removedNodedontShutdownForFiveMinutes.txt
327 kB
26/Apr/17 11:30 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Mark Nunberg (Inactive)

Reporter:: Erik Manor (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Apr/17 3:51 PM

Updated:: 01/May/17 12:22 PM

Resolved:: 01/May/17 12:22 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

CCBC-779: Ensure that we don't suspend the cccp subsystem indefinitely: Gerrit Review:

CCCP subsystem hangs when current source node fails

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty