Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27800

Do not indicate rollback if purger has not run between successive DCP stream requests from a particular client

    XMLWordPrintable

Details

    • Improvement
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 5.0.1, 5.5.0
    • Morpheus
    • couchbase-bucket
    • None

    Description

      Problem:

      Say a DCP client first connects after a long time the server is started (somebody decides to build a secondary index after a few days), and at that time we could have numbers like high_seqno = 1,000,000; purge_seqno = 600,000.

      And, after sending upto seqno 500,000 items, the client disconnects and reconnects back immediately asking the server to resume from 500,001.

      Today, the server would ask the client to rollback to 0 in both cases as start_seqno < purge_seqno (500,001 < 600,000).

      (Additionally, like in CBSE-4662, the problem can be aggravated with consistent failures in completing the large snapshot  0 to 1,000,000).

       

      Proposed Solution:

      Theoretically, if there is no purger runs between the 2 requests of a DCP client with same snapshot numbers {snap_start_seqno, snap_end_seqno}, we should not ask the client to rollback. This is hard to implement as it is not easy to maintain the state (at the Producer) of when and with what snapshot numbers a client had previously opened the stream.

      However, we might be able to avoid the unnecessary rollback with the following addition to the DCP protocol.

      1. Stream request: clients sends (among other params) the quartet {snap_startseqno_x, snapendseqno_x, start_seqno, purgeseqno_x}
      2. Stream_requestresponse : DCP producer sends the current values for {snapstartseqno_y, snapendseqno_y, purgeseqno_y} followed by a few mutations.
      3. Now say the stream/or connection drops and reconnects back and prefers to start from the point where it was disconnected the client would send new stream_request as {snap_startseqno_y, snapendseqno_y, purgeseqno_y}
      4. The producer will not ask for rollback if there is no more purge, that is it uses the purge_seqno_y to find that out.

       A detailed explanation of the problem at hand and the proposed solution can be found at 
      https://docs.google.com/document/d/18qujJsuP1vue3oWffVEgp2QxZRBva6jVzcegDZGxjGY/edit?usp=sharing

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Dave Rigby we will have to see how the situation with the customer develops. Do you have rough estimates for how much work this is?

            shivani.gupta Shivani Gupta added a comment - Dave Rigby we will have to see how the situation with the customer develops. Do you have rough estimates for how much work this is?
            drigby Dave Rigby added a comment -

            Linking to MB-37681 which is an alternative way to deal with the same problem. When that is resolved we can consider closing this MB.

            drigby Dave Rigby added a comment - Linking to MB-37681 which is an alternative way to deal with the same problem. When that is resolved we can consider closing this MB.

            People

              jwalker Jim Walker
              manu Manu Dhundi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty