Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61465

vbucket deletion may not notify clients with pending sync writes

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      The following text describes an issue reproduced on master, with some subtle difference for the 7.x branch (explained after the repro steps)

      This bug affects vbucket moves, or from KV perspective when a vbucket is deleted or set(dead) and deleted.

      Note that the bug exists in all cases if ns_server does either an explicit delete of the active vbucket (which internally will set to dead) or first set-state (to dead) and then delete vbucket. Call these Case1 and Case2 respectively

      Firstly as this is where i've initially reproduced the issue let's follow through on master code.

      In either of the cases the active vbucket is set to dead and both will execute setVBucketState_UNLOCKED to change to dead.

      The setVBucketState_UNLOCKED code when changing away from active will setup a NonIO task that should unblock all waiting cookies (responding with sync_write_ambiguous), this task is called RespondAmbiguousNotification

      Next both Case1 and Case2 are going to delete the VBucket, there are a few steps involved with this, but it begins here.

      In summary: The VBucket is managed via a std::shared_ptr and when the last reference to that std::shared_ptr is removed, a delete function runs and schedules a new task to actually delete the vbucket. This is AuxIO for persistent buckets and NonIO task for ephemeral. The task is called VBucketMemoryDeletionTask

      Next the bug is more obvious when we look at the run loop of RespondAmbiguousNotification and wonder why it could fail to notify any cookies? There is an early return.

      The run function wants to obtain a std::shared_ptr<VBucket> by upgrading the std::weak_ptr. That upgrade will fail if the VBucket has actually been deleted, which happens if the two tasks run in the following order.

      1. VBucketMemoryDeletionTask -> Delete the VBucket
      2. RespondAmbiguous -> Fail to get VBucket so skip notify

      Now if the bucket is shutdown, we will be stuck waiting for cookies to be notified.

      That case has been reproduced using a single-threaded unit test where we can force the task run ordering and indeed we are left with blocked cookies.

      Now what about 7.x? I mentioned it's a more obvious problem. If only Case1 occurs (just a delete of the vbucket), 7.x code does not call setVBucketState_UNLOCKED, there is no scheduling of RespondAmbiguousNotification at all on that path!

      Addition of calling setVBucketState_UNLOCKED came later in 7.6 via MB-54976

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-61465
          # Subject Branch Project Status CR V

          Activity

            People

              jwalker Jim Walker
              jwalker Jim Walker
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty