Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50947

[BP MB-49720 to 6.6.x] - Rollback handling - Retry with previous uuid on rollback message

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 6.6.6
    • eventing
    • Untriaged
    • 1
    • Unknown

    Description

      On kv node failure, DCP consumer may not be able to resume from where it left.
      DCP producer asks DCP consumer to rollback to certain sequence number and history since requested vbucket history won't be available on failed kv node. In this scenario, DCP consumer will rollback to 0 and eventing will start stream from 0.
      Starting stream from 0 can be avoided in certain situation by retrying stream with previous vbucket history.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-6.6.5-10086 contains eventing commit 2049505 with commit message:
            MB-50947: Store failover log in checkpoint blob

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.5-10086 contains eventing commit 2049505 with commit message: MB-50947 : Store failover log in checkpoint blob
            sujay.gad Sujay Gad added a comment -

            Verified the fix using Enterprise Edition 6.6.5 build 10086.

            STEPS TO REPRODUCE

            1. Cluster consists of 2 kv nodes and 1 eventing nodes.
            2. Create 4 namely src_bucket, metadata, dst_bucket, dst_bucket1.
            3. Create 2 handlers bucket_op and timers using first 2 buckets as source bucket and metadata and remaining 2 bucket alias - 1 each for both handlers.
            4. Deploy both eventing handlers.
            5. Stop persistence on the active kv node.
            6. Load 50,000 documents into source bucket.
            7. Once consumer receives all mutations, kill memcached on the active node in order to trigger rollback scenario.

            CASE A
            Reproduced original issue on 6.6.5-10080.
            Eventing does a rollback to the sequence no which DCP asks it to.

            2022-02-16T21:32:40.650-08:00 [Info] Consumer::dcpRequestStreamHandle [worker_timers_0:/tmp/127.0.0.1:8091_0_3400968880.sock:26360] vb: 1 DCP stream start vbKvAddr: 172.23.106.64:11210 vbuuid: 222891922269008 startSeq: 1 snapshotStart: 1 snapshotEnd: 1
            2022-02-16T21:32:40.651-08:00 [Info] Consumer::processDCPEvents [worker_timers_0:/tmp/127.0.0.1:8091_0_3400968880.sock:26360] vb: 1 got STREAMREQ status: ROLLBACK
            2022-02-16T21:32:41.087-08:00 [Info] Consumer::handleFailoverLog [worker_timers_0:/tmp/127.0.0.1:8091_0_3400968880.sock:26360] vb: 1 rollback requested by DCP. Retrying DCP stream start vbuuid: 222891922269008 startSeq: 0 flog startSeqNo: 0
            

            CASE B
            Verified the fix on 6.6.5-10086.
            Eventing avoids restarting DCP stream from 0 in certain scenarios by using previous Vbuuid available in failover log.
            For eg -

            2022-02-22T02:49:10.452-08:00 [Warn] DCPT[eventing:zWOibW56-8240:eventing:zWOibW56-8239:worker_timers_0_0_172.23.120.107:11210_172.23.104.67:8096/0] ##18e STREAMREQ(398) with rollback 0
            2022-02-22T02:49:16.452-08:00 [Info] Consumer::processDCPEvents [worker_timers_0_0:/tmp/127.0.0.1:8091_0_3689154071.sock:26618] vb: 398 got STREAMREQ status: ROLLBACK
            2022-02-22T02:49:16.452-08:00 [Info] Consumer::handleFailoverLog [worker_timers_0_0:/tmp/127.0.0.1:8091_0_3689154071.sock:26618] vb: 398 rollback requested by DCP. New vbuuid: 134910398526963 startSeq: 0 flog startSeqNo: 1306003
            

            sujay.gad Sujay Gad added a comment - Verified the fix using Enterprise Edition 6.6.5 build 10086. STEPS TO REPRODUCE Cluster consists of 2 kv nodes and 1 eventing nodes. Create 4 namely src_bucket , metadata , dst_bucket , dst_bucket1 . Create 2 handlers bucket_op and timers using first 2 buckets as source bucket and metadata and remaining 2 bucket alias - 1 each for both handlers. Deploy both eventing handlers. Stop persistence on the active kv node. Load 50,000 documents into source bucket. Once consumer receives all mutations, kill memcached on the active node in order to trigger rollback scenario. CASE A Reproduced original issue on 6.6.5-10080. Eventing does a rollback to the sequence no which DCP asks it to. 2022 - 02 -16T21: 32 : 40.650 - 08 : 00 [Info] Consumer::dcpRequestStreamHandle [worker_timers_0:/tmp/ 127.0 . 0.1 :8091_0_3400968880.sock: 26360 ] vb: 1 DCP stream start vbKvAddr: 172.23 . 106.64 : 11210 vbuuid: 222891922269008 startSeq: 1 snapshotStart: 1 snapshotEnd: 1 2022 - 02 -16T21: 32 : 40.651 - 08 : 00 [Info] Consumer::processDCPEvents [worker_timers_0:/tmp/ 127.0 . 0.1 :8091_0_3400968880.sock: 26360 ] vb: 1 got STREAMREQ status: ROLLBACK 2022 - 02 -16T21: 32 : 41.087 - 08 : 00 [Info] Consumer::handleFailoverLog [worker_timers_0:/tmp/ 127.0 . 0.1 :8091_0_3400968880.sock: 26360 ] vb: 1 rollback requested by DCP. Retrying DCP stream start vbuuid: 222891922269008 startSeq: 0 flog startSeqNo: 0 CASE B Verified the fix on 6.6.5-10086. Eventing avoids restarting DCP stream from 0 in certain scenarios by using previous Vbuuid available in failover log. For eg - 2022 - 02 -22T02: 49 : 10.452 - 08 : 00 [Warn] DCPT[eventing:zWOibW56- 8240 :eventing:zWOibW56- 8239 :worker_timers_0_0_172. 23.120 . 107 :11210_172. 23.104 . 67 : 8096 / 0 ] ##18e STREAMREQ( 398 ) with rollback 0 2022 - 02 -22T02: 49 : 16.452 - 08 : 00 [Info] Consumer::processDCPEvents [worker_timers_0_0:/tmp/ 127.0 . 0.1 :8091_0_3689154071.sock: 26618 ] vb: 398 got STREAMREQ status: ROLLBACK 2022 - 02 -22T02: 49 : 16.452 - 08 : 00 [Info] Consumer::handleFailoverLog [worker_timers_0_0:/tmp/ 127.0 . 0.1 :8091_0_3689154071.sock: 26618 ] vb: 398 rollback requested by DCP. New vbuuid: 134910398526963 startSeq: 0 flog startSeqNo: 1306003

            People

              sujay.gad Sujay Gad
              ankit.prabhu Ankit Prabhu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty