Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37983

Basic collection routing framework

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • 7.0.0
    • Cheshire-Cat
    • XDCR
    • None

    Description

      Tracking for the ability for XDCR to receive collection-based events (i.e. creation/drop), mutations, and forward to the corresponding collection-enabled target bucket.

       

      Scenario 1 Happy Path – let’s start out with the happy path so you can follow the collections routing principle.

      Prerequisites:

      1. Both Source and Target are collection aware
      2. 1 replication created between Source bucket (B1) and Target Bucket (B2) have the same scope and collection layout for implicit collection (i.e. all mappings are valid)
      3. Collections Manifest Service is functional

       

      Pipeline Creation:

      1. Pipeline manager tells XDCR Factory to create a new pipeline.
      2. XDCR Factory constructs the router. Each router is going to be attached directly to one DCP nozzle. Each router also knows which XMEM nozzles to which the DCP nozzle should be mapped to. This is called downstreamParts.
      3. *New* For each XMEM nozzle (downstreamParts), create a “CollectionsRouter” in front of it. The CollectionsRouter’s job is to map the incoming DCP packet to the right collection ID on the target (and handle simple errors)
      1. The whole pipeline is created and ready to start

       

      Pipeline Start and Data flows:

      1. Pipeline Start()’s and DCP opens up UprStream with collection enabled.
      2. In this example, let’s say the user create a collection name C1, ID is 1.  User then writes a mutation into [CID 1].
      3. Manifest ID 1 is associated with this operation. [MID 1]
      1. Before sending a mutation with a collection ID embedded inside, DCP will send down a system event that says “KV knows about [MID 1] and future muts are based on this knowledge”.
      2. This system event is sent throughout all 1024 vb’s. Some VBs get it earlier, some get it later, depending on how fast DCP nozzle is operating
      1. DCP will receive the system event per VB.
      2. On a Per VB basis, keep track of “the highest MID received”.
      3. Mark the system event “mutation” as processed so stats and throughseq number is correct.
      1. After sending down the system event, KV sends down the mutation user wrote in step 2.
      2. DCP decodes the mutation and figures out a CID of 1 of it.
      3. Based on the “latest manifest” it received (4a), queries the collection manifest service for that manifest.
      4. Looks up CID 1 and finds that it is associated with the name “C1”
      1. Router wraps it and routes and passes it to the *new* CollectionRouter that is sitting in front of each XMEM
      2. The collectionRouter’s RouteCollection() will then do the following:
      3. Lookup the latest target manifest available to XDCR’s Collection Manifest Service
      4. Finds out the CID of the target collection “C1”, writes it into the Key
      1. Send it out via XMEM.

       

      Potential Error Scenarios

      1. XDCR Pulls Target Manifest lags behind the DCP
      2. Intentional “broken” implicit mapping where user did not create correspondingly named collection “c1” on target.
      3. Target Node ns_server has the manifest but target node’s KV does not  (TODO: MB-38023)

       ** 

      Current Error Handling Algorithm

      As XDCR sits on the source side, it has no idea between the situations 1 and 2 above.

      So the error handling algorithm proposed here should tackle both situations in the same way, and then once we know 1 from 2, perform recovery action.

       

      Prerequisite:

      1. Given a source collection name, “C1”, pull the *latest* manifest that XDCR knows of from the target (i.e. MID 5).
      2. Each router collector has an internal map that is called “Broken Mapping”. Right now, it is empty.
      3. The broken mapping says “Source C1 mapping to target C1” is broken.
      1. Right now, it works for implicit mapping. For explicit mapping, this can be enhanced.
      1. Collection Router looks to see if “C1” is in the broken mapping. It is not.
      2. Try to see if the target’s MID 5 contains a collection name “C1”. At this time, assume not. Thus, begins the error handling procedure.

       ** 

      Error Handling Procedure:

      1. Each Collection Router has a “retryQueue”. It is essentially a FIFO buffer and has a size limit.
      2. Each collection router launches a “runRetry()” go-routine that will periodically try to re-map the failed mutations again to the latest target manifest.
      3. If each mutation fails to be mapped after a certain number of times, then that mutation’s mapping is considered “broken”.
      4. If the “retryQueue” buffer is full and one mutation needs to be put into the buffer, then the oldest item in the buffer is essentially “bumped out” and retried by force before the periodic go-routine in 1a gets to retry.
      1. If the “bumped out” mutation fails to be mapped, and because the retryQueue is full, it will immediately be marked “broken”, bypassing 1b.
      2. This is to handle a burst of traffic coming from DCP and prevent it from overwhelming the retry system.
      1. When the mapping failed in Prereq 4, it is put into the retry queue.
      2. If the retry succeeds, then all is well.
      3. If retry failed after Max times, then declare the mapping broken.
      4. Future mutations that has source collection “C1” will automatically be ignored and bypassing the retry mechanism all together.
      5. Note – explicit mapping is more specific and will be addressed later.

       

      This only tells half the story. The second half of the story is *how* the recovery mechanism would work and how the system can get out of this situation.

      There are two recovery mechanisms: Manifest-Service based recovery and I/O based recovery.

       

      Manifest-Service Based Recovery: (NOTE this is not done yet. This is part of the Backfill Manager phase 2)

      1. When there is no I/O going on, manifest-service based recovery would take place.
      2. An example of this is if all mutations have been retried and marked broken based on a target MID of 5, then the target collection is created, with MID of 6.
      1. Collections Manifest Service will periodically pull the target, and sees a new MID of 6, which is newer than MID 5.
      2. It will find the diff between MID 5 and 6, sees that target “C1” now exists. It will then create a backfill request.
      3. This could be racy between this and the I/O recovery path. Will need to think more when working on this. For example, wait a while for I/O based recovery to kick in first before actually needing to create backfill replications.

       

      I/O Based Recovery:

      1. The situation now is that there is a bunch of mutations that were marked broken based off of MID 5, because target MID 5 does not contain “C1”. Currently, any mutation coming from source of C1 is ignored.
      2. XDCR collections manifest service pulls the latest target manifest in the background, got MID 6.
      3. Right now, collection routers’ broken maps contain “C1”, but not for other mutations. In this example, assume “C2” has already existed in MID 5 and any “C2” mutations have been mapping correctly all this time.
      4. The next “C2” mutation that comes through the pipeline, collections router will detect that the latest target manifest is now MID 6 and not 5. Even though “C2” is not affected, it will take this opportunity to see if the elements in its broken map are no longer broken.
      5. It does a diff between MID 5 and 6.
      1. Sees that “C1” now exists in MID 6, and that “C1” was marked broken.
      2. Declares that “C1” is no longer broken. Future “C1” mutations coming down can now pass through and be mapped.

                                                                 iii.      TODO – need to create a backfill replication for all previously missed C1 mutations. (Part of Backfill Milestone)

       

      Application:

      Now, knowing the algorithm and recovery path, let’s see how it applies to our original error conditions.

       

      Scenario 1 Not-Happy Path – XDCR has not pulled the latest Target Manifest

      1. If XDCR pulls the latest target manifest during the retry interval (i.e. no broken mapping declared), then the data will be retried and written successfully to target.
      2. If broken mapping has been declared, then future “C1” writes will:
      3. Depend upon other collections to rescue the broken mapping if I/O is still ongoing. OR
      4. Wait for system’s collection manifest service to pull the latest target manifest that contains “C1”, and create backfill requests.
      1. If there is a gigantic flood of data coming in from DCP and overwhelms the retry buffer:
      2. Then the retry mechanism won’t occur and any mutation with “C1” will immediately declared broken mapped. Then, depend upon 2a/2b to rescue it.

       

      Scenario 2 Intentional “broken” implicit mapping where user did not create correspondingly named collection “c1” on target.

      1. Retry mechanism or overflow mechanism will mark the collection “C1” mapping broken. Saving wasted resource on future “C1” mutations.
      2. Same recovery applies if the user then creates “C1” on target.

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              neil.huang Neil Huang
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty