Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: 7.6.2
Component/s: eventing
Labels:
- eventing
- releasenote

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

Before this patch, when an eventing function is under deployment or being resumed,

If the function has any destination bucket bindings whose LCB instances are to be bootstrapped.
AND
There are ongoing KV service issues, e.g., a network partition, firewall on the KV port, etc.

We leave this LCB instance in an unhealthy state and skip past to continue with deployment.

Unfortunately, this LCB handle will never get repaired even if the KV issues eventually get resolved. This results in any subsequent operations scheduled on this LCB instance to fail (and not return control), which results in a held termination_lock_ and from the customer's perspective, the eventing function gets stuck.

With this patch, we keep track of the statuses of the LCB instances and "lazily" retry bootstrapping the unhealthy LCB instance(s) till the operation timeout. The operation timeout is derived from the script timeout. This process is "lazy" because we retry the LCB bootstrap process when the customer's JavaScript code uses the corresponding bucket binding(s).
With this change in approach, we can ensure that the customer's JavaScript code does not get stuck and times out the mutation's processing instead.

Attachments

Issue Links

backports to

MB-63014 [Trinity]: Retry LCB instance bootstrap till timeout

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Rishit Chaudhary

Reporter:: Abhishek Jindal

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Jun/24 10:03 PM

Updated:: 09/Aug/24 4:20 AM

Resolved:: 09/Aug/24 4:20 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-62403: Code to help reproduce the error scenario: Gerrit Review:

MB-62403: Retry lcb instance bootstrap till timeout: Gerrit Review:

MB-62403: Retry LCB instance bootstrap till timeout: Gerrit Review:

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty