Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.5.0
-
Untriaged
-
Yes
Description
We have a regression test in Analytics that ensures completely removing and re-adding the Analytics service from a cluster works as expected. The test uses cluster_run nodes and starts with a single node (n0) with the Data service and the sample bucket beer-sample loaded. After that, we rebalance-in 3 Analytics only nodes and ingest the beer-sample data using DCP and ensure that we have the expected documents count in Analytics. We then rebalance-out the Analytics nodes and repeat (rebalance-in the Analytics nodes -> ingest data -> ensure expected count -> rebalance-out) few times. Note that those rebalances never involve the Data service. This test has been intermittently failing because the item count we get from the beer-sample bucket is zero. I checked the logs and found out that the vbuckets state is being switched to dead on some topology change which results in delete all the data in there as shown in the logs below:
2019-06-12T13:30:35.819182-07:00 INFO 44: HELO [regular] [ 127.0.0.1:60236 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ] 2019-06-12T13:30:35.856414-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1023 from:active to:dead 2019-06-12T13:30:35.856475-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1023, revision:1 2019-06-12T13:30:35.856515-07:00 INFO (beer-sample) Deletion of vb:1023 was completed. 2019-06-12T13:30:35.856592-07:00 INFO (beer-sample) CouchKVStore::unlinkCouchFile: vb:1023, revision:1, fname:/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/ns_server/data/n_0/datadir/beer-sample/1023.couch.1 2019-06-12T13:30:35.856727-07:00 INFO (beer-sample) ~VBucket(): vb:1023 2019-06-12T13:30:35.857521-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1022 from:active to:dead 2019-06-12T13:30:35.857562-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1022, revision:1 2019-06-12T13:30:35.857579-07:00 INFO (beer-sample) Deletion of vb:1022 was completed. 2019-06-12T13:30:35.857765-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1021 from:active to:dead 2019-06-12T13:30:35.857782-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1021, revision:1 2019-06-12T13:30:35.857791-07:00 INFO (beer-sample) Deletion of vb:1021 was completed. 2019-06-12T13:30:35.857939-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1020 from:active to:dead 2019-06-12T13:30:35.857956-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1020, revision:1 2019-06-12T13:30:35.857972-07:00 INFO (beer-sample) Deletion of vb:1020 was completed. 2019-06-12T13:30:35.858149-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1019 from:active to:dead 2019-06-12T13:30:35.858164-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1019, revision:1 2019-06-12T13:30:35.858172-07:00 INFO (beer-sample) Deletion of vb:1019 was completed. 2019-06-12T13:30:35.858322-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1018 from:active to:dead 2019-06-12T13:30:35.858337-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1018, revision:1 2019-06-12T13:30:35.858345-07:00 INFO (beer-sample) Deletion of vb:1018 was completed. 2019-06-12T13:30:35.858502-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1017 from:active to:dead 2019-06-12T13:30:35.858517-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1017, revision:1 2019-06-12T13:30:35.858524-07:00 INFO (beer-sample) Deletion of vb:1017 was completed. 2019-06-12T13:30:35.858926-07:00 INFO (beer-sample) CouchKVStore::unlinkCouchFile: vb:1022, revision:1, fname:/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/ns_server/data/n_0/datadir/beer-sample/1022.couch.1 2019-06-12T13:30:35.859037-07:00 INFO (beer-sample) ~VBucket(): vb:1022
|
Unfortunately I don't have a full cbcollect-info but all the logs from the cluster_run nodes are attached. Please let me know if any clarification is needed.
Attachments
Issue Links
- causes
-
MB-32485 [CX] Intermittent failure in ClusterRebalanceIT rebalance_MB-28183
- Closed