Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: 6.0.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.5.2, 6.5.0, 7.6.0, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.0.6, 7.1.7, 7.2.2, 7.1.6, 7.2.3, 7.2.5, 7.6.2, 7.2.6, 7.6.1, 7.6.4
Component/s: couchbase-bucket
Labels:
None

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

Needs more investigation but the following situation could mean active/replica mismatch, this all relates to the situation of a poisoned max_cas and use of the "force_max_cas" fix.

A poisoned cas arises first when a mutation is made on a node that has the system clock incorrectly set to the far future, all vbuckets with mutations made when the system clock is "bad" will have their max_cas set to the system time. If the system clock is fixed (set back to "now") the max_cas will remain in the future, and all new mutations against that vbucket get a logical-clock CAS based on max_cas+=1. Those mutations will replicate and the replica max_cas will match the active. XDCR of data will also affect vbucket's of linked clusters.

Fixing this situation is like whack-a-mole, it requires steps roughly as follows.

Fixing system clock
Force max_cas of affected vbuckets to 1 (active+replica)
Touching all affected documents, this will generate a new mutation which will set the CAS to "now" and max_cas will for the first mutation switch from 1 to "now".

However note that tombstones could exist with a future/poisoned CAS, and we don't yet have a solution akin to "touch" - ideally they will get purged out after the tombstone purge interval. Additionally if phase 3 didn't touch all documents (mishandled/error/race with new writes), again we have documents somewhere in the seqno-index with a CAS exceeding max_cas.

However if some vbucket moves occur, e.g. a new replica is created from the active (backfill from 0) then any of those tombstones will be replicated, as soon as the replica processes the tombstone with the future cas it will set max_cas undoing any previous fix.

The replica should always set max_cas to be equal to the high_seqno cas (i.e. for each incoming dcp mutation/deletion just assign max_cas=cas)... the goal is that at the end of the transfer of data, the replica max_cas must equal the active max_cas and not be influenced by any tombstones or historical data which has a poisoned cas.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Jim Walker

Reporter:: Jim Walker

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Jun/24 1:15 AM

Updated:: 29/Jul/24 5:46 AM

Resolved:: 29/Jul/24 5:46 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-62223: Replica must forceMaxCas: Gerrit Review:

Replica max_cas tracking should just copy incoming CAS

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty