Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-394

Upgrade to shared_bucket_access with GSI can cause missed mutations

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Hydrogen
    • Component/s: SyncGateway
    • Security Level: Public
    • Labels:
      None
    • Story Points:
      13

      Description

      When upgrading/switching over to shared_bucket_access, with views we can just reuse the same ones, whereas for GSI we need a new version to pick up the xattr (e.g. sg_access_1 vs sg_access_x1) - this has the side effect that only migrated documents are available for index-based requests (e.g. changes feed). Provided that there's an import node, this will eventually stabilise once all docs are migrated and become available, and any requests in the meantime may just receive slightly stale data.

      There's an issue here though that the migration order is not guaranteed. Take the scenario where there are 1000 active docs with sequences 1..1000: if these are imported in the order 1,2,700,3,4... then any changes feed client connecting at this point will get a response along the lines of:

      {
        "results":[
          {"seq":1,"id":"doc_1",...},
          {"seq":2,"id":"doc_2",...},
          {"seq":3,"id":"doc_3",...},
          {"seq":4,"id":"doc_4",...},
          {"seq":700,"id":"doc_700",...}],
        "last_seq": 700
      }
      

      If this client then start asking for changes with since=700, it can potentially miss documents that weren't migrated at the time of its first request.

      It's possible to workaround this by using views or taking a maintenance window to allow the migration to complete, but both of these options do have downsides.

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          adamf Adam Fraser added a comment - - edited

          The recommended upgrade approach to enable shared bucket access is:

          • remove a single node from the load balancer
          • switch that node to enable shared bucket access, import_docs:true and wait for import to catch up
          • Add node back to load balancer
          • Rolling upgrade of the rest of the nodes to enable shared bucket access

          I feel like this was included in the docs at some point, but I can't find it now.  We'll want to file a docs ticket to improve the upgrade docs to cover this.

          Show
          adamf Adam Fraser added a comment - - edited The recommended upgrade approach to enable shared bucket access is: remove a single node from the load balancer switch that node to enable shared bucket access, import_docs:true and wait for import to catch up Add node back to load balancer Rolling upgrade of the rest of the nodes to enable shared bucket access I feel like this was included in the docs at some point, but I can't find it now.  We'll want to file a docs ticket to improve the upgrade docs to cover this.
          Hide
          James Flather James Flather added a comment -

          I think you hit a similar problem with that when using GSI though - as the new node starts migrating docs, those docs vanish from the indexes/queries that the remaining (load balanced) nodes are using.

          Realistically, you've got some cache coverage there so the chances are reduced, but a new client could still end up with an incomplete dataset (and higher than it ought to be last_seq/since).

          Show
          James Flather James Flather added a comment - I think you hit a similar problem with that when using GSI though - as the new node starts migrating docs, those docs vanish from the indexes/queries that the remaining (load balanced) nodes are using. Realistically, you've got some cache coverage there so the chances are reduced, but a new client could still end up with an incomplete dataset (and higher than it ought to be last_seq/since).
          Hide
          adamf Adam Fraser added a comment -

          Ah that's a good point - that scenario covered when using views (since a single views indexes both), but not when the system is already using GSI prior to enabling shared_bucket_access.  

          I expect you're correct that downtime is required for that type of migration.

          Show
          adamf Adam Fraser added a comment - Ah that's a good point - that scenario covered when using views (since a single views indexes both), but not when the system is already using GSI prior to enabling shared_bucket_access.   I expect you're correct that downtime is required for that type of migration.

            People

            • Assignee:
              The One The One
              Reporter:
              James Flather James Flather
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.