Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32227

Improve dcp rollback handling

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.0.0
    • 6.5.0
    • secondary-index

    Description

      Indexer's handling of dcp rollback can be improved:

      1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

      See http://review.couchbase.org/#/c/74784/ for reference.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            deepkaran.salooja Deepkaran Salooja created issue -
            deepkaran.salooja Deepkaran Salooja made changes -
            Field Original Value New Value
            Link This issue relates to CBSE-5997 [ CBSE-5997 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Fix Version/s 6.0.1 [ 15522 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue relates to MB-31989 [ MB-31989 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Description Indexer's handling of dcp rollback can be improved:

            1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

            See http://review.couchbase.org/#/c/74784/ for reference.

            2. Indexer currently keeps only the latest vbuuid in the timestamp. Indexer could store 2-3 vbuuids in-memory if the seqno has not changed between those(and possibly persist in disk snapshot as well). If DCP asks to rollback to 0, indexer can first retry with stream request with the stored vbuuids before going to disk snapshots.
            Indexer's handling of dcp rollback can be improved:

            1. If DCP instructs to rollback to 0, indexer ignores the disk snapshots and blindly rollbacks to 0. There may be cases where trying the stream request with disk snapshots may succeed (e.g. only vbuuid has changed). It is better to always exhaust trying dcp stream requests with all the disk snapshots before rolling back to 0.

            See http://review.couchbase.org/#/c/74784/ for reference.

            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue relates to MB-32102 [ MB-32102 ]

            QE verification steps:

            1. Create a bucket with 2 replicas.
            2. Load data and achieve 20% resident ratio in the bucket.
            3. Create few indexes. Reduce disk snapshot interval to 1 min.

            curl -X POST -u Administrator:asdasd http://localhost:9102/settings --data '{"indexer.settings.persisted_snapshot.moi.interval":60000}'
            

            5. Do more mutations for a few minutes to create a couple of disk snapshots.
            6. While there is incoming data load in memcached, kill memcached on one node and failover the node in quick succession.
            7. Failover shouldn't cause indexes to rollback to 0 and then rebuild again.

            Also, it would be good to run all rollback related tests to make sure there is no regression.

            deepkaran.salooja Deepkaran Salooja added a comment - QE verification steps: 1. Create a bucket with 2 replicas. 2. Load data and achieve 20% resident ratio in the bucket. 3. Create few indexes. Reduce disk snapshot interval to 1 min. curl -X POST -u Administrator:asdasd http://localhost:9102/settings --data '{"indexer.settings.persisted_snapshot.moi.interval":60000}' 5. Do more mutations for a few minutes to create a couple of disk snapshots. 6. While there is incoming data load in memcached, kill memcached on one node and failover the node in quick succession. 7. Failover shouldn't cause indexes to rollback to 0 and then rebuild again. Also, it would be good to run all rollback related tests to make sure there is no regression.

            Build couchbase-server-6.5.0-1818 contains indexing commit 0f1b4ee with commit message:
            MB-32227 Retry stream request with all disk snapshots...

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1818 contains indexing commit 0f1b4ee with commit message: MB-32227 Retry stream request with all disk snapshots...
            deepkaran.salooja Deepkaran Salooja made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Actual End 2018-12-07 10:02 (issue has been resolved)

            Build couchbase-server-6.0.1-1993 contains indexing commit cc88bd7 with commit message:
            MB-31989 Retry stream request with all disk snapshots...

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.1-1993 contains indexing commit cc88bd7 with commit message: MB-31989 Retry stream request with all disk snapshots...
            wayne Wayne Siu made changes -
            Link This issue backports to MB-32640 [ MB-32640 ]

            Build couchbase-server-5.5.4-4313 contains indexing commit c964d93 with commit message:
            MB-32640 Retry stream request with all disk snapshots...

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.4-4313 contains indexing commit c964d93 with commit message: MB-32640 Retry stream request with all disk snapshots...
            ritam.sharma Ritam Sharma made changes -
            Assignee Deepkaran Salooja [ deepkaran.salooja ] Mihir Kamdar [ mihir.kamdar ]

            Closing based on the verification of above tests done with 6.5.0-4789 and tests verified in : https://docs.google.com/spreadsheets/d/11vNDoQFV75YD71zcYnZ7ExxXaFvV9itfJIENUMhIpDI/edit#gid=0

            girish.benakappa Girish Benakappa added a comment - Closing based on the verification of above tests done with 6.5.0-4789 and tests verified in : https://docs.google.com/spreadsheets/d/11vNDoQFV75YD71zcYnZ7ExxXaFvV9itfJIENUMhIpDI/edit#gid=0
            girish.benakappa Girish Benakappa made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            simon.dew Simon Dew made changes -
            Link This issue relates to DOC-6178 [ DOC-6178 ]

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty