Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34598

Bucket intermittently loses all data on topology changes

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      We have a regression test in Analytics that ensures completely removing and re-adding the Analytics service from a cluster works as expected. The test uses cluster_run nodes and starts with a single node (n0) with the Data service and the sample bucket beer-sample loaded. After that, we rebalance-in 3 Analytics only nodes and ingest the beer-sample data using DCP and ensure that we have the expected documents count in Analytics. We then rebalance-out the Analytics nodes and repeat (rebalance-in the Analytics nodes -> ingest data -> ensure expected count -> rebalance-out) few times. Note that those rebalances never involve the Data service. This test has been intermittently failing because the item count we get from the beer-sample bucket is zero. I checked the logs and found out that the vbuckets state is being switched to dead on some topology change which results in delete all the data in there as shown in the logs below:

       

      2019-06-12T13:30:35.819182-07:00 INFO 44: HELO [regular] [ 127.0.0.1:60236 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ] 2019-06-12T13:30:35.856414-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1023 from:active to:dead 2019-06-12T13:30:35.856475-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1023, revision:1 2019-06-12T13:30:35.856515-07:00 INFO (beer-sample) Deletion of vb:1023 was completed. 2019-06-12T13:30:35.856592-07:00 INFO (beer-sample) CouchKVStore::unlinkCouchFile: vb:1023, revision:1, fname:/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/ns_server/data/n_0/datadir/beer-sample/1023.couch.1 2019-06-12T13:30:35.856727-07:00 INFO (beer-sample) ~VBucket(): vb:1023 2019-06-12T13:30:35.857521-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1022 from:active to:dead 2019-06-12T13:30:35.857562-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1022, revision:1 2019-06-12T13:30:35.857579-07:00 INFO (beer-sample) Deletion of vb:1022 was completed. 2019-06-12T13:30:35.857765-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1021 from:active to:dead 2019-06-12T13:30:35.857782-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1021, revision:1 2019-06-12T13:30:35.857791-07:00 INFO (beer-sample) Deletion of vb:1021 was completed. 2019-06-12T13:30:35.857939-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1020 from:active to:dead 2019-06-12T13:30:35.857956-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1020, revision:1 2019-06-12T13:30:35.857972-07:00 INFO (beer-sample) Deletion of vb:1020 was completed. 2019-06-12T13:30:35.858149-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1019 from:active to:dead 2019-06-12T13:30:35.858164-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1019, revision:1 2019-06-12T13:30:35.858172-07:00 INFO (beer-sample) Deletion of vb:1019 was completed. 2019-06-12T13:30:35.858322-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1018 from:active to:dead 2019-06-12T13:30:35.858337-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1018, revision:1 2019-06-12T13:30:35.858345-07:00 INFO (beer-sample) Deletion of vb:1018 was completed. 2019-06-12T13:30:35.858502-07:00 INFO (beer-sample) VBucket::setState: transitioning vb:1017 from:active to:dead 2019-06-12T13:30:35.858517-07:00 INFO (beer-sample) EPVBucket::setupDeferredDeletion(0x0) vb:1017, revision:1 2019-06-12T13:30:35.858524-07:00 INFO (beer-sample) Deletion of vb:1017 was completed. 2019-06-12T13:30:35.858926-07:00 INFO (beer-sample) CouchKVStore::unlinkCouchFile: vb:1022, revision:1, fname:/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/ns_server/data/n_0/datadir/beer-sample/1022.couch.1 2019-06-12T13:30:35.859037-07:00 INFO (beer-sample) ~VBucket(): vb:1022
      

       

      Unfortunately I don't have a full cbcollect-info but all the logs from the cluster_run nodes are attached. Please let me know if any clarification is needed.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-34598
          # Subject Branch Project Status CR V

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              murtadha.hubail Murtadha Hubail
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty