Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31989

[BP 6.0.1] Improve dcp rollback handling

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.0.0
    • 6.0.1
    • secondary-index

    Description

      Indexer's handling of dcp rollback can be improved:

      1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

      See http://review.couchbase.org/#/c/74784/ for reference.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            deepkaran.salooja Deepkaran Salooja created issue -
            deepkaran.salooja Deepkaran Salooja made changes -
            Field Original Value New Value
            Link This issue relates to CBSE-5997 [ CBSE-5997 ]
            jeelan.poola Jeelan Poola made changes -
            Fix Version/s 6.0.1 [ 15522 ]
            jeelan.poola Jeelan Poola made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            jeelan.poola Jeelan Poola made changes -
            Assignee Jeelan Poola [ jeelan.poola ] Deepkaran Salooja [ deepkaran.salooja ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Summary Improve dcp rollback handling [BP 6.0.1] Improve dcp rollback handling
            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue relates to MB-32227 [ MB-32227 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Description Indexer's handling of dcp rollback can be improved:

            1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

            See http://review.couchbase.org/#/c/74784/ for reference.

            2. Indexer currently keeps only the latest vbuuid in the timestamp. Indexer could store 2-3 vbuuids in-memory if the seqno has not changed between those(and possibly persist in disk snapshot as well). If DCP asks to rollback to 0, indexer can first retry with stream request with the stored vbuuids before going to disk snapshots.
            Indexer's handling of dcp rollback can be improved:

            1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

            See http://review.couchbase.org/#/c/74784/ for reference.

            QE verification steps:

            1. Create a bucket with 2 replicas.
            2. Load data and achieve 20% resident ratio in the bucket.
            3. Create few indexes. Reduce disk snapshot interval to 1 min.

            curl -X POST -u Administrator:asdasd http://localhost:9102/settings --data '{"indexer.settings.persisted_snapshot.moi.interval":60000}'
            

            5. Do more mutations for a few minutes to create a couple of disk snapshots.
            6. While there is incoming data load in memcached, kill memcached on one node and failover the node in quick succession.
            7. Failover shouldn't cause indexes to rollback to 0 and then rebuild again.

            Also, it would be good to run all rollback related tests to make sure there is no regression.

            deepkaran.salooja Deepkaran Salooja added a comment - QE verification steps: 1. Create a bucket with 2 replicas. 2. Load data and achieve 20% resident ratio in the bucket. 3. Create few indexes. Reduce disk snapshot interval to 1 min. curl -X POST -u Administrator:asdasd http://localhost:9102/settings --data '{"indexer.settings.persisted_snapshot.moi.interval":60000}' 5. Do more mutations for a few minutes to create a couple of disk snapshots. 6. While there is incoming data load in memcached, kill memcached on one node and failover the node in quick succession. 7. Failover shouldn't cause indexes to rollback to 0 and then rebuild again. Also, it would be good to run all rollback related tests to make sure there is no regression.
            deepkaran.salooja Deepkaran Salooja made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Actual End 2018-12-07 10:02 (issue has been resolved)

            Build couchbase-server-6.0.1-1993 contains indexing commit cc88bd7 with commit message:
            MB-31989 Retry stream request with all disk snapshots...

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.1-1993 contains indexing commit cc88bd7 with commit message: MB-31989 Retry stream request with all disk snapshots...
            mihir.kamdar Mihir Kamdar (Inactive) made changes -
            Assignee Deepkaran Salooja [ deepkaran.salooja ] Ajay Bhullar [ ajay.bhullar ]
            ajay.bhullar Ajay Bhullar made changes -
            ajay.bhullar Ajay Bhullar added a comment -

            verified in 6.0.1-2024. Have tried the verification steps under various different network conditions, did not see the issue reported by the CBSE. 

            ajay.bhullar Ajay Bhullar added a comment - verified in 6.0.1-2024. Have tried the verification steps under various different network conditions, did not see the issue reported by the CBSE. 
            ajay.bhullar Ajay Bhullar made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue relates to MB-32640 [ MB-32640 ]

            People

              ajay.bhullar Ajay Bhullar
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty