Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.5.0
Affects Version/s: master, 6.5.0
Component/s: couchbase-bucket
Labels:
- approved-for-mad-hatter

Triage:
Triaged
Is this a Regression?:
No
Sprint:
KV-Engine Mad-Hatter GA, KV Sprint 2019-12

Description

Issue is in PassiveDurabilityMonitor::completeSyncWrite.

296  void PassiveDurabilityMonitor::completeSyncWrite(

297          const StoredDocKey& key,

298          Resolution res,

299          boost::optional<uint64_t> prepareSeqno) {

300      auto s = state.wlock();

302      // If we are receiving a disk snapshot, we need to relax a few checks

303      // to account for deduplication. E.g., commits may appear to be out

304      // of order

305      bool enforceOrderedCompletion = !vb.isReceivingDiskSnapshot();

..

321      // If we can complete out of order, we have to check from the start of

322      // tracked writes as the HCS may have advanced past a prepare we have not

323      // seen a completion for

324      auto next = enforceOrderedCompletion

325                          ? s->getIteratorNext(s->highCompletedSeqno.it)

326                          : s->trackedWrites.begin();

328      if (!enforceOrderedCompletion) {

329          // Advance the iterator to the right item, it might not be the first

330          while (next != s->trackedWrites.end() && next->getKey() != key) {

331              next = s->getIteratorNext(next);

332          }

333      }

..

358      if (prepareSeqno && next->getBySeqno() != static_cast<int64_t>(*prepareSeqno)) {

359          std::stringstream ss;

360          ss << "Pending resolution for '" << *next

361             << "', but received unexpected " + to_string(res) + " for key "

362             << cb::tagUserData(key.to_string())

363             << " different prepare seqno: " << *prepareSeqno;

364          throwException<std::logic_error>(__func__, "" + ss.str());

365      }

..

398      // HCS may have moved, which could make some Prepare eligible for removal.

399      s->checkForAndRemovePrepares();

..

410  }

Scenario example

Replica receives the following for the same <key>:

PRE:1 and M:2 (logic CMT:2) in a Disk Snapshot(1, 2)
<The flusher has not persisted anything yet>
PRE:3 and M:4 (logic CMT:4) in a second Disk Snapshot(3, 4)

Important Note: when we process M:2 we do not remove PRE:1 from PDM::State::trackedWrites at line 399.
The reason is that we remove only locally-satisfied prepares, but PRE:1 is not locally-satisfied as the flusher has never persisted the entire Disk Snapshot(1, 2).
See comments in PassiveDurabilityMonitor::State::updateHighPreparedSeqno for details.

Focus on when we process M:4 now:

prepareSeqno = 3 (as M:4 is commit for PRE:3)
PDM::State::trackedWrites contains {PRE:1(completed), PRE:3(in-flight)}
We execute into the block at 328-333. Next points to PRE:1(completed) after the block. <— This is the root cause of the issue
Given that next-byseqno(1) != prepareSeqno(3) then we enter the block at 358-365 and throw.

So in general,

If:

More than one Disk Snapshot is received by a replica node, and
Each Disk Snapshot contains a completed SyncWrite (Committed or Aborted) for the same key, and
The flusher has not completed flushing the first Disk snapshot before the the Commit/Abort in the second Disk Snapshot is received,

Then:

The replica will incorrectly reject the DCP_COMMIT/ABORT in the second snapshot
As a result an exception is thrown

That will cause:

The DCP connection to be closed, if the DCP_COMMIT/ABORT is processed in a front-end thread (common case)
Or memcached crash, if the DCP_COMMIT/ABORT is processed in a bg-thread (eg, buffered message processed in the DcpConsumerTask)

In both cases, if a rebalance is in progress then it will fail.
The vBucket is in steady-state then the connection should be re-established (after the node is restarted if it had crashed) by ns_server and nodes will retry.
Once the flusher completes flushing the first Disk Snapshot, then the problem should no longer occur.

Attachments

Issue Links

is triggering

MB-37206 [SR - Test Only] Expand test scenarios for 'prepare completed but still tracked at Replica at OoO completion'

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-37063
#	Subject	Branch	Project	Status	CR	V
119031,3	MB-37063: Replica must pick-up the right Prepare at completion	mad-hatter	kv_engine	Status: MERGED	+2	+1
119087,3	MB-37206: Expand test scenarios for MB-37063	mad-hatter	kv_engine	Status: MERGED	+2	+1
119215,1	Merge branch 'mad-hatter'	master	kv_engine	Status: MERGED	+2	+1
119433,1	Merge branch 'mad-hatter'	master	kv_engine	Status: MERGED	+2	+1

Activity

People

Assignee:: Paolo Cocchi

Reporter:: Paolo Cocchi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Due:: 09/Dec/19

Created:: 26/Nov/19 7:17 AM

Updated:: 02/May/23 7:10 AM

Resolved:: 09/Dec/19 4:30 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 4 closed Gerrit changes

Hide There are 4 closed Gerrit changes

MB-37063: Replica must pick-up the right Prepare at completion: Gerrit Review:

MB-37206: Expand test scenarios for MB-37063: Gerrit Review:

Merge branch 'mad-hatter': Gerrit Review:

Merge branch 'mad-hatter': Gerrit Review:

Replica may fail at receiving multiple consecutive Disk Checkpoints

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty