Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31695

Collections: Update consumers with vbucket high seqno in bounded time

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • None
    • master
    • couchbase-bucket
    • None

    Description

      A few key functional aspects of indexing depend on the vbucket high seqno(as explained below). In the current proposed design, DCP introduces an Anti-Lag mechanism to keep the dcp consumers subscribing to a subset of collections updated with the latest high seqno of the bucket. The Anti-Lag message is based on how far behind a client is and that is determined only based on the number of mutations(time is not a factor).

      This has following implications:

      1. Initial Index Build - Initial index build will be based on streaming the mutations per collection. But as there is no ability to find out the high seqno per collection, Indexer will use the high seqno of the bucket as the final seqno to build upto, before the initial build is considered done. If the bucket high seqno is 100 and collection A has its last mutation at 90, DCP may not send its antilag message and indexer could keep waiting.

      2. Stream Merge - The same scenario happens when an index from INIT_STREAM gets merged to MAINT_STREAM. As MAINT_STREAM is a bucket level stream, it will always have the latest seqno. But INIT_STREAM may or may not get the latest seqno or there could be an arbitrary amount of time delay in receiving the message and merging to the MAINT_STREAM.

      3. Consistency scans - If consistent scans have to be served from a DCP stream with collection filtering enabled, that will not be possible with the current KV design. As consistency scans are based on bucket's seqnum, without having the latest seqno at consumer, there is no way for consumer to know if it is going to receive any further mutations or only an Anti-Lag message from KV.

      It is important for DCP to consider enhancing the anti-lag mechanism to have "time" as a factor.

      Without this, the only option for indexing is to always open bucket level stream for everything which is going to be very expensive.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jwalker Jim Walker
            deepkaran.salooja Deepkaran Salooja
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty