Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Critical
Fix Version/s: Morpheus, 7.6.2
Affects Version/s: Morpheus
Component/s: eventing
Labels:
- need-doc

Story Points:
0

Description

One-Line Summary
According to Eventing design doc, (MB-50944), upon every "SBM" write, it stores the eventing ID inside a document. This metadata will cause XDCR to ping pong and never stop replicating in an active-active deployment when both clusters have its own eventing functions with its own eventing "fiid".

Eventing Doc

Eventing/SGW Support Doc: https://docs.google.com/document/d/1KfgW6SqETp_vviCjRAklmYKx6mBoi2LqWBo92L9w-Hg/edit#heading=h.lcogjmc7wx92

Issue

Eventing's SBM field contains a “fiid” field as well as a Eventing.PCAS indicating the UUID of an eventing “actor” and the version that the actor performed eventing on the document.
Eventing will thus also set the decorated document back to the source bucket. This event will trigger a mutation down from DCP.
The fiid field and the Eventing.PCAS pair is meant to act as a check. Eventing will determine that the new mutation (that has been set back by itself in the first pass) is essentially a no-op and will not further perform another decoration.

See: https://docs.google.com/document/d/1KfgW6SqETp_vviCjRAklmYKx6mBoi2LqWBo92L9w-Hg/edit#heading=h.4i8zi04h8fwb
The statement:

if (xattr.hasOwnProperty('_eventing') && xattr['_eventing'].fiid == current_fiid) {

    // give priority to matching cas if it exists

    if (xattr['_eventing'].hasOwnProperty('cas')) {

        return xattr['_eventing']['cas'] == meta.cas;

    } else {

        return xattr['_eventing']['seq'] == meta.seq;

However, this check fails to take into account that XDCR now needs to replicate the change to other bucket/clusters in the topologies.
If each of the target cluster bucket also has eventing running, and by design, the other eventing actors do not share the same fiid, it will lead to infinite ping-pong.

The main issue that causes this is because the eventing Xattr is one-dimensional and does not contain the ability to record causality between multiple eventing actors.

See the example below:

C1 has Eventing running, with eventing function ID “ec1”
C2 has Eventing running, with eventing function ID “ec2”

SDK Writes Doc A

C1:

—

CAS: 100

C1 XDCR replicates to C2

C2:

—

			CAS: 100

			CvCAS: 100

C2 eventing fires

C2:

—

			CAS: 150

			CvCAS: 100

			Eventing.CAS: 150

			Eventing.PCAS: 100

			Eventing.fiid: “ec2”

Eventing on C1 sees that the document is not handled.

C1:

——

CAS: 120

CvCAS: 100

Eventing.CAS: 120

Eventing.PCAS: 100

Eventing.fiid: “ec1”

XDCR C1 loses (cas 120 < cas 150)

XDCR C2 Wins (cas 150 > cas 120)
Compose HLV, sends

Doc received on C1 from C2:

C1

—

CAS: 150

CvCAS: 150

Eventing.CAS: 150

Eventing.PCAS 100

Eventing.fiid: “ec2”

fiid of Cluster 1 is “ec1”, eventing will re-run due to fiid mismatch, and tag the fiid to “ec1”:

C1

—

CAS: 170

CvCAS: 150

Eventing.CAS: 170

Eventing.PCAS: 150

Eventing.fiid: “ec1”

XDCR C1 wins over C2 (Cas 170 > Cas 150)

C1 sends the doc over to C2:

C2

—

			CAS: 170

			CvCAS: 170

			eventing.CAS: 170

			eventing.PCAS: 150

			eventing.fiid: “ec1”

The fiid no longer matches to the C2’s fiid, and C2 eventing will fire, and XDCR will replicate from C2 to C1
<Repeat>

Attachments

Issue Links

is parent task of

DOC-12127 Doc for Eventing/SGW co-existence design incompatible with bi-directional XDCR

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Abhishek Jindal

Reporter:: Neil Huang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Mar/24 8:38 PM

Updated:: 19/Apr/24 11:42 AM

Gerrit Reviews

There are no open Gerrit changes

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty