Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27604

Improve DCP rollback due to purge

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • 5.5.0
    • 4.6.4, 5.0.1
    • couchbase-bucket
    • None

    Description

      Background:
      Currently the DCP producer asks a client to rollback if "snap_start_seqno < purge_seqno” rather than continuing from the point where the client had left off.
      The reason for this is not miss out on any permanently deleted items.
      For example: Say a client is receiving a snapshot from  seqno 0 to 5, and the snapshot looks as below:
             Seqno|  1   |  2  |  3   |  4    | 5 
      -----------------------------------------------------------------
      Mutations| K1 |  K2 | K3 |  D2 | K4
                                       ^
                                       |
       
      Say the stream read drops after the client reads seqno == 2 and the client reconnects back with the request {start_seqno, snap_start_seqno, snap_end_seqno} as {2, 0, 5}.
      By then if the purger had run and if we permanently had purged D2, then the vbucket contents would look like:
             Seqno|  1   |   3  | 5 
      -----------------------------------------------------------------
      Mutations| K1 |  K3 | K4
       
      In this case resuming from seqno 3 would be an error as the client would never know that the K2 is deleted. 
       
       
      Problem/Inefficiency in CBSE-4662:
      It is possible that a DCP client first connects after a vbucket has a lot of items (say 1 million) and after purge has run a few time thereby purge_seqno > 0 (say 100,000).
      In the first snapshot it receives (from 0 to 1,000,000), if there is a failure after the stream has received most items in the snapshot say 900,000, it reconnects back with {start_seqno, snap_start_seqno, snap_end_seqno} as {900,000, 0, 1,000,000}.
      Now the producer would ask the client to rollback and start from 0 again as "snap_start_seqno < purge_seqno”.
       
      (And in CBSE-4662, the problem is aggravated as there is consistent failure in completing the large snapshot  0 to 1,000,000).
       
       
      Proposed Solution:
      I would argue that "snap_start_seqno < purge_seqno” is a stricter check and an overkill. The check "start_seqno < purge_seqno” is sufficient (unless someone can prove that this is erroneous).

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-27604
          # Subject Branch Project Status CR V

          Activity

            People

              manu Manu Dhundi (Inactive)
              manu Manu Dhundi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty