Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.0.0, 5.0.1, 5.1.0, 5.1.1, 5.1.2, 5.5.0, 5.5.1, 5.5.2
-
None
-
Triaged
-
Unknown
Description
Rebalance fails while performing the following test (steps are in order for reproducing on both debug/rel vulcan builds on local OSX):
- export COUCHBASE_NUM_VBUCKETS=4 && ./cluster_run -n 2
- ./couchbase-cli cluster-init --cluster=localhost:9000 --cluster-username=admin --cluster-password=admin1 --services=data --cluster-ramsize=256
- ./couchbase-cli bucket-create -c localhost:9000 -u admin -p admin1 --bucket=example --bucket-type=couchbase --bucket-ramsize=100 --bucket-replica=1 --bucket-eviction-policy=fullEviction --enable-flush=1 --wait
- cbc-pillowfight --spec="couchbase://127.0.0.1:12000/example" --username=admin --password=admin1 --batch-size=1 --num-items=200000 --num-threads=2 --set-pct=100 --min-size=1024 --max-size=1024 --random-body --populate-only
- ./couchbase-cli cluster-init --cluster=localhost:9001 --cluster-username=admin --cluster-password=admin1 --services=data --cluster-ramsize=256
- ./couchbase-cli server-add --cluster=127.0.0.1:9000 --username=admin --password=admin1 --server-add=127.0.0.1:9001 --server-add-username=admin --server-add-password=admin1
- ./couchbase-cli rebalance -c localhost:9000 -u admin -p admin1
Rebalance fails with:
Rebalancing
|
Bucket: 01/01 (example) 100000 docs remaining
|
[========================================================================================================================== ] 83.33%
|
ERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
We log disconnections in memcached.log at n_0:
2018-09-26T11:59:03.860260Z INFO (No Engine) DCP (Consumer) eq_dcpq:replication:n_1@127.0.0.1->n_0@192.168.4.65:example - Removing connection [ 192.168.4.65:58533 - 192.168.4.65:11999 (<ud>@ns_server</ud>) ]
|
2018-09-26T11:59:03.860292Z WARNING 44: (example) DCP (Consumer) eq_dcpq:replication:n_1@127.0.0.1->n_0@192.168.4.65:example - (vb 2) Setting stream to dead state, last_seqno is 50000, unAckedBytes is 0, status is The stream closed early because the conn was disconnected
|
2018-09-26T11:59:03.860303Z WARNING 44: (example) DCP (Consumer) eq_dcpq:replication:n_1@127.0.0.1->n_0@192.168.4.65:example - (vb 3) Setting stream to dead state, last_seqno is 50000, unAckedBytes is 0, status is The stream closed early because the conn was disconnected
|
2018-09-26T11:59:03.860909Z INFO (No Engine) DCP (Producer) eq_dcpq:replication:n_0@192.168.4.65->n_1@127.0.0.1:example - Removing connection [ 192.168.4.65:58476 - 192.168.4.65:11999 (<ud>@ns_server</ud>) ]
|
2018-09-26T11:59:03.877834Z INFO 42: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.4.65->n_1@127.0.0.1:example - (vb 0) Stream closing, sent until seqno 34369 remaining items 0, reason: The stream closed early because the conn was disconnected
|
2018-09-26T11:59:03.877847Z INFO 42: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.4.65->n_1@127.0.0.1:example - (vb 1) Stream closing, sent until seqno 50000 remaining items 0, reason: The stream closed early because the conn was disconnected
|
Here the reason of the disconnection in memcached.log at n_1:
2018-09-26T11:59:03.676598Z CRITICAL *** Fatal error encountered during exception handling ***
|
2018-09-26T11:59:03.676638Z CRITICAL Caught unhandled std::exception-derived exception. what(): Monotonic<x> invariant failed: new value (24871) breaks invariant on current value (24872)
|
2018-09-26T11:59:03.678398Z CRITICAL Call stack: /Users/paoloc/dev/source/install/lib/libplatform_so.0.1.0.dylib(print_backtrace_to_buffer+0x4e) [0x10c63c000+0xc2ee]
|
/Users/paoloc/dev/source/install/lib/libmcd_util.1.0.0.dylib(_ZL13log_backtracev+0x33) [0x10c3e0000+0x55853]
|
/Users/paoloc/dev/source/install/lib/libmcd_util.1.0.0.dylib(_ZL27backtrace_terminate_handlerv+0x58) [0x10c3e0000+0x55638]
|
/usr/lib/libc++abi.dylib(_ZSt11__terminatePFvvE+0x8) [0x7fffbaa84000+0x22d49]
|
/usr/lib/libc++abi.dylib(_ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP17_Unwind_Exception+0) [0x7fffbaa84000+0x227be]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN20ThrowExceptionPolicyIxE12nonMonotonicERKxS2_+0xc8c) [0x11b200000+0x66c1c]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN9MonotonicIx26DefaultOrderReversedPolicyNSt3__17greaterEEaSERKx+0x67) [0x11b200000+0x4f7b7]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN17CheckpointManager10setBySeqnoEx+0x65) [0x11b200000+0x1a08c5]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN9EPVBucket17queueBackfillItemER19SingleThreadedRCPtrI4ItemPS1_NSt3__114default_deleteIS1_EEE15GenerateBySeqno+0x190) [0x11b200000+0x1a05d0]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN7VBucket10queueDirtyER11StoredValue15GenerateBySeqno11GenerateCasbP22PreLinkDocumentContext+0x440) [0x11b200000+0x39bb80]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN7VBucket10queueDirtyER11StoredValueRK14VBQueueItemCtx+0x90) [0x11b200000+0x3b3130]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN9EPVBucket17addNewStoredValueERKN9HashTable14HashBucketLockERK4ItemRK14VBQueueItemCtx16GenerateRevSeqno+0x91) [0x11b200000+0x1a3001]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN7VBucket10processSetERKN9HashTable14HashBucketLockERP11StoredValueR4ItemybbRK14VBQueueItemCtxN2cb13StoreIfStatusEbb+0x15b9) [0x11b200000+0x3a0459]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN7VBucket15addBackfillItemER4Item15GenerateBySeqno+0x220) [0x11b200000+0x3a1610]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN8KVBucket15addBackfillItemER4Item15GenerateBySeqnoP16ExtendedMetaData+0x219) [0x11b200000+0x2e9829]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN13PassiveStream15processMutationEP16MutationResponse+0x8a3) [0x11b200000+0x15c9b3]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN13PassiveStream23processBufferedMessagesERjm+0x34c) [0x11b200000+0x15f62c]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN11DcpConsumer25drainStreamsBufferedItemsENSt3__110shared_ptrI13PassiveStreamEEm+0x1bb) [0x11b200000+0xc5b1b]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN11DcpConsumer20processBufferedItemsEv+0x101) [0x11b200000+0xc7351]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN15DcpConsumerTask3runEv+0x383) [0x11b200000+0xd7663]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZN14ExecutorThread3runEv+0x1212) [0x11b200000+0x263052]
|
/Users/paoloc/dev/source/install/lib/ep.dylib(_ZL22launch_executor_threadPv+0x1d) [0x11b200000+0x26197d]
|
/Users/paoloc/dev/source/install/lib/libplatform_so.0.1.0.dylib(_ZN15CouchbaseThread3runEv+0x21e) [0x10c63c000+0x474e]
|
/Users/paoloc/dev/source/install/lib/libplatform_so.0.1.0.dylib(_ZL20platform_thread_wrapPv+0x102) [0x10c63c000+0x2c02]
|
/usr/lib/system/libsystem_pthread.dylib(_pthread_body+0xb4) [0x7fffbc0b3000+0x393b]
|
/usr/lib/system/libsystem_pthread.dylib(_pthread_body+0) [0x7fffbc0b3000+0x3887]
|
/usr/lib/system/libsystem_pthread.dylib(thread_start+0xd) [0x7fffbc0b3000+0x308d]
|
Notes:
- The error is easily reproducible with the steps and the setup above
- But, it is not deterministic (e.g., I don't see the issue if I repeat the test with COUCHBASE_NUM_VBUCKETS=1024)
- Symptoms are very similar to the ones in
MB-29501