Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
6.6.2, 6.6.5, 7.0.4, 7.1.1
-
None
-
1
Description
Summary
We should investigate using the last complete in-memory snapshot for calculating the failover table branch point, instead of the last on-disk persisted snapshot.
Context
During failover, when a replica vBucket is promoted to active KV-Engine creates a failover table entry to identify the new history. The branch point is created at the most recent consistent point which has been persisted to disk - see KVBucket:: setVBucketState_UNLOCKED:
KVBucket::setVBucketState_UNLOCKED |
if (to == vbucket_state_active && oldstate != vbucket_state_active && |
transfer == TransferVB::No) {
|
// Changed state to active and this isn't a transfer (i.e. |
// takeover), which means this is a new fork in the vBucket history |
// - create a new failover table entry. |
const snapshot_range_t range = vb->getPersistedSnapshot(); |
auto highSeqno = range.getEnd() == vb->getPersistenceSeqno()
|
? range.getEnd()
|
: range.getStart();
|
vb->createFailoverEntry(highSeqno);
|
Note the highlighted lines - if a complete snapshot has been persisted then we place the failover branch point at the end of that snapshot, if not we place the branch point at the start of that snapshot - i.e. the previous consistent point.
Historically this made sense as a vBucket state change was persisted to disk asynchronously with respect to the sequence of mutations - i.e. we would persist the setVBState immediately, "in the middle" of the outstanding mutations of the current snapshot, and hence we could only consider the on-disk state when determining the failover branch point.
Proposal
Since MB-35331 (https://review.couchbase.org/c/kv_engine/+/113904) included in v6.5.0, the vBucket state change is recorded in a meta-item and enqueued (in-order) in the CheckpointManager. As such, if we happen to have a complete Snapshot in-memory (which is not yet persisted to disk) then we should be able to set the failover table branch point at the end of the complete in-memory snapshot.
This allows us to move the failover table branch point to a higher sequence number than we currently do, but still have a valid, consistent branch point, which in turn should reduce the amount of rollback a DCP consumer may need to perform when re-connecting to this newly-promoted active VB.
(See MB-53172 for an example scenario where this was significant.)