Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62403

Retry LCB instance bootstrap till timeout

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Before this patch, when an eventing function is under deployment or being resumed,

      • If the function has any destination bucket bindings whose LCB instances are to be bootstrapped.
        AND
      • There are ongoing KV service issues, e.g., a network partition, firewall on the KV port, etc. 

      We leave this LCB instance in an unhealthy state and skip past to continue with deployment.

      Unfortunately, this LCB handle will never get repaired even if the KV issues eventually get resolved. This results in any subsequent operations scheduled on this LCB instance to fail (and not return control), which results in a held termination_lock_ and from the customer's perspective, the eventing function gets stuck.

      With this patch, we keep track of the statuses of the LCB instances and "lazily" retry bootstrapping the unhealthy LCB instance(s) till the operation timeout. The operation timeout is derived from the script timeout.  This process is "lazy" because we retry the LCB bootstrap process when the customer's JavaScript code uses the corresponding bucket binding(s).
      With this change in approach, we can ensure that the customer's JavaScript code does not get stuck and times out the mutation's processing instead.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              rishit.chaudhary Rishit Chaudhary
              abhishek.jindal Abhishek Jindal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty