Problem:
Since recent changes to notify persistence to ADM when vBucket is dead (MB-51606), we attempt to update the ADM to account for the new persisted seqno. This can cause a crash if there are ActiveSyncWrite objects in trackedWrites which do not have any chain information - we throw std::invalid_argument here:
DurabilityMonitor::ActiveSyncWrite::ack(const std::string &) durability_monitor_impl.cc:133
|
ActiveDurabilityMonitor::State::advanceAndAckForPosition(DurabilityMonitor::Position<…> &, const std::string &, bool) active_durability_monitor.cc:772
|
ActiveDurabilityMonitor::State::advanceNodePosition(const std::string &) active_durability_monitor.cc:739
|
ActiveDurabilityMonitor::State::updateHighPreparedSeqno(ActiveDurabilityMonitor::ResolvedQueue &) active_durability_monitor.cc:1667
|
ActiveDurabilityMonitor::checkForCommit() active_durability_monitor.cc:1721
|
ActiveDurabilityMonitor::notifyLocalPersistence(folly::SharedMutexImpl<…>::ReadHolder &) active_durability_monitor.cc:430
|
VBucket::notifyPersistenceToDurabilityMonitor() vbucket.cc:4125
|
EPBucket::flushVBucket_UNLOCKED(LockedVBucketPtr) ep_bucket.cc:872
|
EPBucket::flushVBucket(Vbid) ep_bucket.cc:378
|
The ActiveSyncWrite does not have a valid chain because it was previously completed (via removeForCommitIfSatisfied->removeSyncWrite), and we reset an ActiveSyncWrite's chains to null when it is removed from trackedWrites.
(This is necessary because once an ActiveSyncWrite is moved from trackedWrites to the resolvedQueue, it can be accessed by a background NonIO thread, and hence it would not be safe to access ReplicationChain objects owned by ADM from another thread without taking a lock).
Possible Solutions:
One way or another we need to avoid calling ActiveSyncWrite::ack() when the syncWrite has a null chain. That boils down to two options:
1. Don't call ack() if there's no chain.
2. Ensure there is a chain when we attempt to call ack.
For (1), we could skip calling ack() if there's no chain on the ActiveSyncWrite. However, looking into this we see that ack() is called from advanceNodePosition(), which itself advances the active node position in the ADM. It feels incorrect to be moving any iterators on the ADM when is in the dead state, so we probably shouldn't be advancing iterators either in this situation. However at that point in the code we don't really "know" we are in the dead state - we haven't yet examined the individual ActiveSyncWrites.
We could also reset the chain inside the ActiveDM (and not just the individual ActiveSyncWrites) when we transition to dead - e.g in prepareTransitionAwayFromActive() :
@@ -1112,6 +1120,15 @@ ActiveDurabilityMonitor::prepareTransitionAwayFromActive() {
|
// trackedWrites to deal with that
|
unresolveCompletedSyncWriteQueue();
|
|
+ {
|
+ auto s = state.wlock();
|
+ for (auto& sw : s->trackedWrites) {
|
+ sw.resetChains();
|
+ }
|
+ s->firstChain.reset();
|
+ s->secondChain.reset();
|
+ }
|
+
|
Then we could say check for a null ADM::firstChain and handle it differently - not advancing any iterators etc. Infact we already have some special-case code for a null firstChain, which is designed to handle the ADM -> PDM case:
if (!firstChain) {
|
// An ActiveDM _may_ legitimately have no topology information, if
|
// for example it has just been created from a PassiveDM during takeover
|
// and ns_server has not yet updated the VBucket's topology.
|
// In this case, it may be possible to update the HPS and we should do
|
// so to ensure that any subsequent state change back to
|
// replica/PassiveDM acks correctly if we never got a topology. We can
|
// update the highPreparedSeqno for anything that the PDM completed
|
// (we should have nothing in trackedWrites not completed as we have no
|
// topology) by using the store value instead of the iterator. Given
|
// we only keep these completed SyncWrites in trackedWrites to correctly
|
// set the HPS when we DO get a topology, we can remove them once we
|
// have advanced past them.
|
auto itr = trackedWrites.begin();
|
while (itr != trackedWrites.end()) {
|
if (!itr->isCompleted()) {
|
return;
|
}
|
|
// Don't advance past anything not persisted.
|
auto level = itr->getDurabilityReqs().getLevel();
|
if ((level == cb::durability::Level::PersistToMajority ||
|
level == cb::durability::Level::MajorityAndPersistOnMaster) &&
|
static_cast<uint64_t>(itr->getBySeqno()) <
|
adm.vb.getPersistenceSeqno()) {
|
return;
|
}
|
|
highPreparedSeqno = itr->getBySeqno();
|
|
auto next = std::next(itr);
|
trackedWrites.erase(itr);
|
itr = next;
|
}
|
return;
|
}
|
However we cannot blindly use this code, as the items in trackedWrites may not be complete - consider the original scenario from the test for MB-51606 (PersistenceForDeadVBucketNotifiesDurabilityMonitor) where there is still a uncompleted SyncWrite in trackedWrites at the point the vBucket transitions to dead.
We would therefore have to modify the above code to handle this slightly different case when the vBucket is dead - such as check what the vBucket's state is - but this is starting to feel not very clean - we have an ADM which we are advancing state on when the vBucket is not active...
Returning to option (2), we could instead reinstate the chain when we unresolve items from resolvedQueue back into trackedWrites during the state transition to dead - see unresolveCompletedSyncWriteQueue(). That would make the restored ActiveSyncWrites similar to the ones which were already in tracedWrites (and incomplete):
@@ -613,6 +613,14 @@ void ActiveDurabilityMonitor::unresolveCompletedSyncWriteQueue() {
|
// Second, whack them back into trackedWrites. The container should be in
|
// seqno order so we will just put them at the front of trackedWrites.
|
auto s = state.wlock();
|
+
|
+ // SyncWrites in resolvedQueue have had their chain information cleared
|
+ // (via resetChains in removeSyncWrite). Restore the current chain
|
+ // information so we can correctly process any acks (e.g. local ack from
|
+ // persistence) now they are back in trackedWrites.
|
+ for (auto& sw : writesToTrack) {
|
+ sw.resetTopology(*s->firstChain, s->secondChain.get());
|
+ }
|
This makes the ActiveSyncWrites consistent with respect to the ADM - all ActiveSyncWrites in trackeWrites have the chains from the ADM. This would allow us to "safely" ack ActiveSyncWrites. However, we are now changing even more state of the ADM when the vBucket is active; including generating Commit / Abort messages (unless we special-case)...
Overall it's not clear to me what is the best way to proceed. Thinking back to the changes for the original MB-51606, in hindsight I'm not sure if we should be allowing persistance notifications to be performed on the ADM - at least not in the current form where all sorts of state is changed.
As such, I want to wait untli Ben is available to discuss this further.
CC Paolo Cocchi.
Thanks Ashwin Govindarajulu.
In the 7.1.1.-3066 repro, the affected vbucket is vb:320 - we see the same exception being thrown:
#6 0x00007f9eaadb0bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f9e74000940, tinfo=0x10866b0 <typeinfo for std::logic_error>, dest=0x445660 <_ZNSt11logic_errorD1Ev@plt>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
#7 0x00000000004c89f4 in DurabilityMonitor::ActiveSyncWrite::ack (this=0x7f9e37890890, node=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:133
131
132
133
134
135
}
Just prior to the crash we see the vBucket change from active to dead, and DCP streams for vb:320 close:
2022-05-17T22:00:07.344232-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:648 from:active to:active meta:{"topology":[["ns_1@172.23.105.155","ns_1@172.23.105.211"],["ns_1@172.23.105.211","ns_1@172.23.105.212"]]}
2022-05-17T22:00:07.413925-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-wait to takeover-send
2022-05-17T22:00:07.414100-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:649 from:active to:dead
2022-05-17T22:00:07.414175-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Vbucket marked as dead, last sent seqno: 649, high seqno: 649
2022-05-17T22:00:07.421490-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-send to takeover-wait
2022-05-17T22:00:07.427911-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Receive ack for set vbucket state to active message
2022-05-17T22:00:07.427948-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Stream closing, sent until seqno 649 remaining items 0, reason: The stream ended due to all items being streamed
2022-05-17T22:00:07.447209-07:00 CRITICAL *** Fatal error encountered during exception handling ***