Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-26905

DCP: Differentiate Between Create and Update Operations

    XMLWordPrintable

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 5.0.0
    • None
    • DCP

    Description

      As of now, it is not possible to differentiate between Create and Update operations, and the same OpCode is being sent for both operations.

      Eventing project requires that there be a scheme by which the OpCodes are different so that DCP captures this information for any adjacent systems to write business logic based on these two very different form of operations.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            Adding my comments from email thread here:

            This is essentially asking to disable de-duplication on a per-DCP consumer basis.

            So the thing to bear in mind is that ep-engine was architectured to (a) support millions of operations per second. To achieve that, it needs to minimise the amount of information which is written to disk and replicated over the network. As such, de-duplicaiton isn’t a specific “feature” which was enabled at any point, it’s a core part of how mutations are managed.

            In terms of how mutations are processed, every mutation is queued into the CheckpointManager, and held in a per-vBucket queue of pending items to write to disk (and send to DCP consumers). This batching is done to:

            (a) Amortise any fixed-costs in writing to disk (we attempt to write many mutations to disk at once if possible).
            (b) Allow de-duplication of any mutations / deletions to the same key - we simply discard older mutations to the same key, as they have been superseded.

            In our customers’ use-cases it’s common to have a small number of very hot keys (which are mutated very frequently), and by having this batching / de-duplication we significantly reduce the cost (and hence allow greater throughput).

            Additionally, on disk we only keep the most recent version of a key (or a tombstone marker if deleted). As such, DCP backfill from disk simply cannot give you every mutation a key has had - we simply don’t store that data (and it would increase by many factors if we did attempt to keep every mutation).

            Furthermore, Ephemeral buckets (new in Spock) don’t use disk at all; all data is kept in memory. They perform similar de-duplicaiton of updates to the same key, for the same reason to minimise memory usage.

            To summarise, ep-engine has been architectured around the requirement that we minimise at all costs the amount of data we write to disk (and stream over the network) - that’s how it achieves such high throughput. Any changes to modify this (for example to move de-duplication later in the system, so it could be disabled for a single DCP consumer) would require significant restructuring of ep-engine.

            Question on your use-case: How do you plan to handle create vs. update vs. delete for disk backfill? For example, if you (re)connect a DCP stream and are sent a single batch of changes since you last connected; and since you last connected a key was deleted and then re-created; what would you expect to see for that?

            Follow-up question: What about the case where there's been both 1x create and 1x update on the same key since you last connected?

            drigby Dave Rigby added a comment - Adding my comments from email thread here: This is essentially asking to disable de-duplication on a per-DCP consumer basis. So the thing to bear in mind is that ep-engine was architectured to (a) support millions of operations per second. To achieve that, it needs to minimise the amount of information which is written to disk and replicated over the network. As such, de-duplicaiton isn’t a specific “feature” which was enabled at any point, it’s a core part of how mutations are managed. In terms of how mutations are processed, every mutation is queued into the CheckpointManager, and held in a per-vBucket queue of pending items to write to disk (and send to DCP consumers). This batching is done to: (a) Amortise any fixed-costs in writing to disk (we attempt to write many mutations to disk at once if possible). (b) Allow de-duplication of any mutations / deletions to the same key - we simply discard older mutations to the same key, as they have been superseded. In our customers’ use-cases it’s common to have a small number of very hot keys (which are mutated very frequently), and by having this batching / de-duplication we significantly reduce the cost (and hence allow greater throughput). Additionally, on disk we only keep the most recent version of a key (or a tombstone marker if deleted). As such, DCP backfill from disk simply cannot give you every mutation a key has had - we simply don’t store that data (and it would increase by many factors if we did attempt to keep every mutation). Furthermore, Ephemeral buckets (new in Spock) don’t use disk at all; all data is kept in memory. They perform similar de-duplicaiton of updates to the same key, for the same reason to minimise memory usage. To summarise, ep-engine has been architectured around the requirement that we minimise at all costs the amount of data we write to disk (and stream over the network) - that’s how it achieves such high throughput. Any changes to modify this (for example to move de-duplication later in the system, so it could be disabled for a single DCP consumer) would require significant restructuring of ep-engine. Question on your use-case : How do you plan to handle create vs. update vs. delete for disk backfill? For example, if you (re)connect a DCP stream and are sent a single batch of changes since you last connected; and since you last connected a key was deleted and then re-created; what would you expect to see for that? Follow-up question : What about the case where there's been both 1x create and 1x update on the same key since you last connected?
            drigby Dave Rigby added a comment -

            Assigning to Venkat to address the two questions I had (in my last comment).

            drigby Dave Rigby added a comment - Assigning to Venkat to address the two questions I had (in my last comment).
            talaviss Tal added a comment - - edited

            Any progress regarding this issue of:

            1. Flag to disallow Deduplication to capture All Updates
            2. Differentiate Between Create and Update Operations

            This is so important because without it we don't have event streaming when we update multiple times in short time window the same document

             

            talaviss Tal added a comment - - edited Any progress regarding this issue of: Flag to disallow Deduplication to capture All Updates Differentiate Between Create and Update Operations This is so important because without it we don't have event streaming when we update multiple times in short time window the same document  

            People

              shivani.gupta Shivani Gupta
              venkatraman.subramanian Venkatraman Subramanian (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty