Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.5.1, 4.6.0
-
None
-
Untriaged
-
No
Description
(4.6.0-3501)
Following Scenario leads to dcp stream hangups and consequent rebalance hang:
3 node cluster, 1 bucket (2 replica)
1. Hard Failover Node 1
Rebalance
2. Add back Node 1
Rebalance
3. Hard Failover Node 2
Rebalance
4. Add back Node 2
Rebalance (hangs)
Failed stream is view related
[ns_server:info,2016-11-23T15:58:38.043-05:00,babysitter_of_n_2@127.0.0.1:<0.78.0>:ns_port_server:log:210]memcached<0.78.0>: 2016-11-23T15:58:37.838053-05:00 WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/integrity (prod/replica) - (vb 476) Stream request failed because this vbucket is in backfill state
|
memcached<0.78.0>: 2016-11-23T15:58:37.942504-05:00 WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/integrity (prod/replica) - (vb 476) Stream request failed because this vbucket is in backfill state
|
|
[couchdb:info,2016-11-23T15:58:38.050-05:00,couchdb_n_2@127.0.0.1:<0.5332.0>:couch_log:info:41]dcp client (<0.329.0>): Temporary failure on stream request on partition 476. Retrying...
|
And rebalance hangs. Was able to reproduce this this on vms and cluster run.
To repro with cluster_run (requires docker on local host):
./cluster_run -n3
|
|
go get github.com/couchbaselabs/sequoia
|
cd $GOPATH/src/github.com/couchbaselabs/sequoia/
|
go build
|
./sequoia -provider dev:10.0.0.5 -test tests/integration/test_dcpRollback.yml -scope tests/integration/scope_3Node2replica.yml -scale 3 -log_level 2
|
**NOTE 10.0.0.5 is your localhost ip**