Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51606

SyncWrites requiring persistence may get stuck during rebalance if persisted while vBucket is dead

    XMLWordPrintable

Details

    Description

      1. Create a 3 node cluster and a default bucket.
      2. Load 100k items in a magma bucket with durability=MAJORITY_AND_PERSIST_TO_ACTIVE
      3. Start a durable upsert load in async.
      4. Rebalance in 1 node. while upserts are running. Retry the failed documents due to DurabilityAmbiguousException.
      5. Start a durable upsert load in async.
      6. Rebalance out 1 node. while upserts are running. Retry the failed documents due to DurabilityAmbiguousException.
      7. Retry failed.

      >>> 
      >>> r = cb_coll.upsert("test_docs-4991", "")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/collection.py", line 293, in wrapped
          return func(self, *args, **kwargs)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/result.py", line 682, in mutated
          result = func(*args, **kwargs)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/collection.py", line 878, in upsert
          return ResultPrecursor(CoreClient.upsert(
      couchbase.exceptions.CouchbaseException: <Key='test_docs-4991', RC=0x136[LCB_ERR_DURABLE_WRITE_IN_PROGRESS (310)], Operational Error, Results=1, C Source=(src/multiresult.c,332), Context={'status_code': 4, 'opaque': 3, 'cas': 0, 'key': 'test_docs-4991', 'bucket': 'default', 'collection': '_default', 'scope': '_default', 'context': '', 'ref': '', 'endpoint': '172.23.121.116:11210', 'type': 'KVErrorContext'}, Tracing Output={"test_docs-4991": {"debug_info": {"FILE": "src/callbacks.c", "FUNC": "dur_chain2", "LINE": 751}}}>
      >>> 
      >>> 
      >>> r = cb_coll.get("test_docs-4991")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/collection.py", line 293, in wrapped
          return func(self, *args, **kwargs)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/result.py", line 544, in wrapped
          x, options = func(*args, **kwargs)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/collection.py", line 521, in get
          return self._get_generic(key, kwargs, options)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase/collection.py", line 484, in _get_generic
          x = CoreClient.get(self.bucket, key, **opts)
        File "/Users/ritesh.agarwal/Library/Python/3.8/lib/python/site-packages/couchbase_core/client.py", line 409, in get
          return super(Client, self).get(*args, **kwargs)
      couchbase.exceptions.CouchbaseException: <Key='test_docs-4991', RC=0x137[LCB_ERR_DURABLE_WRITE_RE_COMMIT_IN_PROGRESS (311)], Operational Error, Results=1, C Source=(src/multiresult.c,332), Context={'status_code': 164, 'opaque': 7, 'cas': 0, 'key': 'test_docs-4991', 'bucket': 'default', 'collection': '_default', 'scope': '_default', 'context': '', 'ref': '', 'endpoint': '172.23.121.116:11210', 'type': 'KVErrorContext'}, Tracing Output={"test_docs-4991": {"debug_info": {"FILE": "src/callbacks.c", "FUNC": "value_callback", "LINE": 856}}}>
      >>> 
      

      8. The document is stuck like this forever.

      Note: Not really sure if this is a regression as this looks like the test is passing occasionally. Trying on 7.0 GA build.

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job4.ini -p GROUP=P0;durability,durability=MAJORITY_AND_PERSIST_TO_ACTIVE,bucket_storage=magma,rerun=False,enable_dp=false,autoCompactionDefined=true,infra_log_level=debug,log_level=debug,bucket_storage=magma,upgrade_version=7.1.0-2506 -t rebalance_new.rebalance_in_out.RebalanceInOutTests.test_incremental_rebalance_out_in_with_mutation,upgrade_version=7.1.0-2506,rerun=False,enable_dp=false,GROUP=P0;durability,get-cbcollect-info=False,replicas=1,durability=MAJORITY_AND_PERSIST_TO_ACTIVE,bucket_storage=magma,log_level=debug,nodes_init=3,autoCompactionDefined=true,infra_log_level=debug'
      

      Failing on couchstore as well:
      http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.1.0-2284/jenkins_logs/test_suite_executor-TAF/169859/consoleText.txt
      http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.1.0-1601/jenkins_logs/test_suite_executor-TAF/152135/consoleText.txt

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.2.0-1474 contains kv_engine commit 1a6fb5d with commit message:
            MB-51606: Notify PDM of last consistent point on transition

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1474 contains kv_engine commit 1a6fb5d with commit message: MB-51606 : Notify PDM of last consistent point on transition

            Build couchbase-server-8.0.0-1038 contains kv_engine commit f284ab2 with commit message:
            MB-51606: Don't set receivedSnapshotEnd in PDM ctor if seqno is 0

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1038 contains kv_engine commit f284ab2 with commit message: MB-51606 : Don't set receivedSnapshotEnd in PDM ctor if seqno is 0

            Build couchbase-server-8.0.0-1038 contains kv_engine commit 96b4c30 with commit message:
            MB-51606: Don't set lastSeqno to 1000 in VBucketTest

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1038 contains kv_engine commit 96b4c30 with commit message: MB-51606 : Don't set lastSeqno to 1000 in VBucketTest

            Build couchbase-server-8.0.0-1038 contains kv_engine commit 1a6fb5d with commit message:
            MB-51606: Notify PDM of last consistent point on transition

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1038 contains kv_engine commit 1a6fb5d with commit message: MB-51606 : Notify PDM of last consistent point on transition
            ritesh.agarwal Ritesh Agarwal added a comment - - edited

            Tried reproducing the issue on 7.1.2-3317(test_logs) multiple times and as this is an inconsistent issue i was unsuccessful reproducing it.
            Given that the 7.1.2 job for component:rebalance_in_out_persist_active_6.5_P0 is passing and the 3358 build also looking fine. Additionally i re-ran the test on latest build 5 times without observing any issue, I think we can close this defect.

            Will be keeping an eye on 7.2.0 magma runs having the same tests to see any other side effects.

            ~Thanks

            ritesh.agarwal Ritesh Agarwal added a comment - - edited Tried reproducing the issue on 7.1.2-3317( test_logs ) multiple times and as this is an inconsistent issue i was unsuccessful reproducing it. Given that the 7.1.2 job for component:rebalance_in_out_persist_active_6.5_P0 is passing and the 3358 build also looking fine. Additionally i re-ran the test on latest build 5 times without observing any issue, I think we can close this defect. Will be keeping an eye on 7.2.0 magma runs having the same tests to see any other side effects. ~Thanks

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty