Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52169

Memcached crashed during rebalance_in + compaction in parallel

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Build: 7.1.1-3056-enterprise

      Scenario:

      • 2 node KV cluster
      • Couchbase bucket with 20K items as initial load
      • Add new node (.251) and start rebalance_in + compaction running in parallel

      Observation:

      Rebalance failed due to the memcached crash on the new node (.251)

      memcached's backtrace -> memcached_bt.log

      TAF test:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.9407.ini durability=MAJORITY,threads_to_use=10,sdk_timeout=60,GROUP=P1,upgrade_version=7.1.1-3056 -t bucket_param_tests.autocompaction.AutoCompactionTests.rebalance_in_with_auto_DB_compaction,key=test_collections,num_items=20000,autocompaction_value=70,nodes_init=2,nodes_in=1,replicas=1,GROUP=P1'
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            Thanks Ashwin Govindarajulu.

            In the 7.1.1.-3066 repro, the affected vbucket is vb:320 - we see the same exception being thrown:

             #6  0x00007f9eaadb0bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f9e74000940, tinfo=0x10866b0 <typeinfo for std::logic_error>, dest=0x445660 <_ZNSt11logic_errorD1Ev@plt>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
             #7  0x00000000004c89f4 in DurabilityMonitor::ActiveSyncWrite::ack (this=0x7f9e37890890, node=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:133
            

            131
            void DurabilityMonitor::ActiveSyncWrite::ack(const std::string& node) {
            132
                if (!firstChain) {
            133
                    throw std::logic_error(
            134
                            "SyncWrite::ack: Acking without a ReplicationChain");
            135
                }
            

            Just prior to the crash we see the vBucket change from active to dead, and DCP streams for vb:320 close:

            2022-05-17T22:00:07.344232-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:648 from:active to:active meta:{"topology":[["ns_1@172.23.105.155","ns_1@172.23.105.211"],["ns_1@172.23.105.211","ns_1@172.23.105.212"]]}
            2022-05-17T22:00:07.413925-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-wait to takeover-send
            2022-05-17T22:00:07.414100-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:649 from:active to:dead
            2022-05-17T22:00:07.414175-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Vbucket marked as dead, last sent seqno: 649, high seqno: 649
            2022-05-17T22:00:07.421490-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-send to takeover-wait
            2022-05-17T22:00:07.427911-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Receive ack for set vbucket state to active message
            2022-05-17T22:00:07.427948-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Stream closing, sent until seqno 649 remaining items 0, reason: The stream ended due to all items being streamed
            2022-05-17T22:00:07.447209-07:00 CRITICAL *** Fatal error encountered during exception handling ***
            

            drigby Dave Rigby added a comment - Thanks Ashwin Govindarajulu . In the 7.1.1.-3066 repro, the affected vbucket is vb:320 - we see the same exception being thrown: #6 0x00007f9eaadb0bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f9e74000940, tinfo=0x10866b0 <typeinfo for std::logic_error>, dest=0x445660 <_ZNSt11logic_errorD1Ev@plt>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95 #7 0x00000000004c89f4 in DurabilityMonitor::ActiveSyncWrite::ack (this=0x7f9e37890890, node=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:133 131 void DurabilityMonitor::ActiveSyncWrite::ack( const std::string& node) { 132 if (!firstChain) { 133 throw std::logic_error( 134 "SyncWrite::ack: Acking without a ReplicationChain" ); 135 } Just prior to the crash we see the vBucket change from active to dead, and DCP streams for vb:320 close: 2022-05-17T22:00:07.344232-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:648 from:active to:active meta:{"topology":[["ns_1@172.23.105.155","ns_1@172.23.105.211"],["ns_1@172.23.105.211","ns_1@172.23.105.212"]]} 2022-05-17T22:00:07.413925-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-wait to takeover-send 2022-05-17T22:00:07.414100-07:00 INFO (bucket-0) VBucket::setState: transitioning vb:320 with high seqno:649 from:active to:dead 2022-05-17T22:00:07.414175-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Vbucket marked as dead, last sent seqno: 649, high seqno: 649 2022-05-17T22:00:07.421490-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) ActiveStream::transitionState: Transitioning from takeover-send to takeover-wait 2022-05-17T22:00:07.427911-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Receive ack for set vbucket state to active message 2022-05-17T22:00:07.427948-07:00 INFO 88: (bucket-0) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.211:bucket-0 - (vb:320) Stream closing, sent until seqno 649 remaining items 0, reason: The stream ended due to all items being streamed 2022-05-17T22:00:07.447209-07:00 CRITICAL *** Fatal error encountered during exception handling ***
            drigby Dave Rigby added a comment -

            Reproduced issue locally in unit test - appears to be a case not covered in the patches for MB-51606.

            drigby Dave Rigby added a comment - Reproduced issue locally in unit test - appears to be a case not covered in the patches for MB-51606 .
            drigby Dave Rigby added a comment - - edited

            Problem:

            Since recent changes to notify persistence to ADM when vBucket is dead (MB-51606), we attempt to update the ADM to account for the new persisted seqno. This can cause a crash if there are ActiveSyncWrite objects in trackedWrites which do not have any chain information - we throw std::invalid_argument here:

            DurabilityMonitor::ActiveSyncWrite::ack(const std::string &) durability_monitor_impl.cc:133
            ActiveDurabilityMonitor::State::advanceAndAckForPosition(DurabilityMonitor::Position<…> &, const std::string &, bool) active_durability_monitor.cc:772
            ActiveDurabilityMonitor::State::advanceNodePosition(const std::string &) active_durability_monitor.cc:739
            ActiveDurabilityMonitor::State::updateHighPreparedSeqno(ActiveDurabilityMonitor::ResolvedQueue &) active_durability_monitor.cc:1667
            ActiveDurabilityMonitor::checkForCommit() active_durability_monitor.cc:1721
            ActiveDurabilityMonitor::notifyLocalPersistence(folly::SharedMutexImpl<…>::ReadHolder &) active_durability_monitor.cc:430
            VBucket::notifyPersistenceToDurabilityMonitor() vbucket.cc:4125
            EPBucket::flushVBucket_UNLOCKED(LockedVBucketPtr) ep_bucket.cc:872
            EPBucket::flushVBucket(Vbid) ep_bucket.cc:378
            

            The ActiveSyncWrite does not have a valid chain because it was previously completed (via removeForCommitIfSatisfied->removeSyncWrite), and we reset an ActiveSyncWrite's chains to null when it is removed from trackedWrites.

            (This is necessary because once an ActiveSyncWrite is moved from trackedWrites to the resolvedQueue, it can be accessed by a background NonIO thread, and hence it would not be safe to access ReplicationChain objects owned by ADM from another thread without taking a lock).

            Possible Solutions:

            One way or another we need to avoid calling ActiveSyncWrite::ack() when the syncWrite has a null chain. That boils down to two options:

            1. Don't call ack() if there's no chain.
            2. Ensure there is a chain when we attempt to call ack.

            For (1), we could skip calling ack() if there's no chain on the ActiveSyncWrite. However, looking into this we see that ack() is called from advanceNodePosition(), which itself advances the active node position in the ADM. It feels incorrect to be moving any iterators on the ADM when is in the dead state, so we probably shouldn't be advancing iterators either in this situation. However at that point in the code we don't really "know" we are in the dead state - we haven't yet examined the individual ActiveSyncWrites.

            We could also reset the chain inside the ActiveDM (and not just the individual ActiveSyncWrites) when we transition to dead - e.g in prepareTransitionAwayFromActive() :

            @@ -1112,6 +1120,15 @@ ActiveDurabilityMonitor::prepareTransitionAwayFromActive() {
                 // trackedWrites to deal with that
                 unresolveCompletedSyncWriteQueue();
             
            +    {
            +        auto s = state.wlock();
            +        for (auto& sw : s->trackedWrites) {
            +            sw.resetChains();
            +        }
            +        s->firstChain.reset();
            +        s->secondChain.reset();
            +    }
            +
            

            Then we could say check for a null ADM::firstChain and handle it differently - not advancing any iterators etc. Infact we already have some special-case code for a null firstChain, which is designed to handle the ADM -> PDM case:

                if (!firstChain) {
                    // An ActiveDM _may_ legitimately have no topology information, if
                    // for example it has just been created from a PassiveDM during takeover
                    // and ns_server has not yet updated the VBucket's topology.
                    // In this case, it may be possible to update the HPS and we should do
                    // so to ensure that any subsequent state change back to
                    // replica/PassiveDM acks correctly if we never got a topology. We can
                    // update the highPreparedSeqno for anything that the PDM completed
                    // (we should have nothing in trackedWrites not completed as we have no
                    // topology) by using the store value instead of the iterator. Given
                    // we only keep these completed SyncWrites in trackedWrites to correctly
                    // set the HPS when we DO get a topology, we can remove them once we
                    // have advanced past them.
                    auto itr = trackedWrites.begin();
                    while (itr != trackedWrites.end()) {
                        if (!itr->isCompleted()) {
                            return;
                        }
             
                        // Don't advance past anything not persisted.
                        auto level = itr->getDurabilityReqs().getLevel();
                        if ((level == cb::durability::Level::PersistToMajority ||
                             level == cb::durability::Level::MajorityAndPersistOnMaster) &&
                            static_cast<uint64_t>(itr->getBySeqno()) <
                                    adm.vb.getPersistenceSeqno()) {
                            return;
                        }
             
                        highPreparedSeqno = itr->getBySeqno();
             
                        auto next = std::next(itr);
                        trackedWrites.erase(itr);
                        itr = next;
                    }
                    return;
                }
            

            However we cannot blindly use this code, as the items in trackedWrites may not be complete - consider the original scenario from the test for MB-51606 (PersistenceForDeadVBucketNotifiesDurabilityMonitor) where there is still a uncompleted SyncWrite in trackedWrites at the point the vBucket transitions to dead.

            We would therefore have to modify the above code to handle this slightly different case when the vBucket is dead - such as check what the vBucket's state is - but this is starting to feel not very clean - we have an ADM which we are advancing state on when the vBucket is not active...

            Returning to option (2), we could instead reinstate the chain when we unresolve items from resolvedQueue back into trackedWrites during the state transition to dead - see unresolveCompletedSyncWriteQueue(). That would make the restored ActiveSyncWrites similar to the ones which were already in tracedWrites (and incomplete):

            @@ -613,6 +613,14 @@ void ActiveDurabilityMonitor::unresolveCompletedSyncWriteQueue() {
                 // Second, whack them back into trackedWrites. The container should be in
                 // seqno order so we will just put them at the front of trackedWrites.
                 auto s = state.wlock();
            +
            +    // SyncWrites in resolvedQueue have had their chain information cleared
            +    // (via resetChains in removeSyncWrite). Restore the current chain
            +    // information so we can correctly process any acks (e.g. local ack from
            +    // persistence) now they are back in trackedWrites.
            +    for (auto& sw : writesToTrack) {
            +        sw.resetTopology(*s->firstChain, s->secondChain.get());
            +    }
            

            This makes the ActiveSyncWrites consistent with respect to the ADM - all ActiveSyncWrites in trackeWrites have the chains from the ADM. This would allow us to "safely" ack ActiveSyncWrites. However, we are now changing even more state of the ADM when the vBucket is active; including generating Commit / Abort messages (unless we special-case)...

            Overall it's not clear to me what is the best way to proceed. Thinking back to the changes for the original MB-51606, in hindsight I'm not sure if we should be allowing persistance notifications to be performed on the ADM - at least not in the current form where all sorts of state is changed.

            As such, I want to wait untli Ben is available to discuss this further.

            CC Paolo Cocchi.

            drigby Dave Rigby added a comment - - edited Problem: Since recent changes to notify persistence to ADM when vBucket is dead ( MB-51606 ), we attempt to update the ADM to account for the new persisted seqno. This can cause a crash if there are ActiveSyncWrite objects in trackedWrites which do not have any chain information - we throw std::invalid_argument here: DurabilityMonitor::ActiveSyncWrite::ack( const std::string &) durability_monitor_impl.cc:133 ActiveDurabilityMonitor::State::advanceAndAckForPosition(DurabilityMonitor::Position<…> &, const std::string &, bool ) active_durability_monitor.cc:772 ActiveDurabilityMonitor::State::advanceNodePosition( const std::string &) active_durability_monitor.cc:739 ActiveDurabilityMonitor::State::updateHighPreparedSeqno(ActiveDurabilityMonitor::ResolvedQueue &) active_durability_monitor.cc:1667 ActiveDurabilityMonitor::checkForCommit() active_durability_monitor.cc:1721 ActiveDurabilityMonitor::notifyLocalPersistence(folly::SharedMutexImpl<…>::ReadHolder &) active_durability_monitor.cc:430 VBucket::notifyPersistenceToDurabilityMonitor() vbucket.cc:4125 EPBucket::flushVBucket_UNLOCKED(LockedVBucketPtr) ep_bucket.cc:872 EPBucket::flushVBucket(Vbid) ep_bucket.cc:378 The ActiveSyncWrite does not have a valid chain because it was previously completed (via removeForCommitIfSatisfied->removeSyncWrite), and we reset an ActiveSyncWrite's chains to null when it is removed from trackedWrites. (This is necessary because once an ActiveSyncWrite is moved from trackedWrites to the resolvedQueue, it can be accessed by a background NonIO thread, and hence it would not be safe to access ReplicationChain objects owned by ADM from another thread without taking a lock). Possible Solutions: One way or another we need to avoid calling ActiveSyncWrite::ack() when the syncWrite has a null chain. That boils down to two options: 1. Don't call ack() if there's no chain. 2. Ensure there is a chain when we attempt to call ack. For (1), we could skip calling ack() if there's no chain on the ActiveSyncWrite. However, looking into this we see that ack() is called from advanceNodePosition(), which itself advances the active node position in the ADM. It feels incorrect to be moving any iterators on the ADM when is in the dead state, so we probably shouldn't be advancing iterators either in this situation. However at that point in the code we don't really "know" we are in the dead state - we haven't yet examined the individual ActiveSyncWrites. We could also reset the chain inside the ActiveDM (and not just the individual ActiveSyncWrites) when we transition to dead - e.g in prepareTransitionAwayFromActive() : @@ -1112,6 +1120,15 @@ ActiveDurabilityMonitor::prepareTransitionAwayFromActive() { // trackedWrites to deal with that unresolveCompletedSyncWriteQueue(); + { + auto s = state.wlock(); + for (auto& sw : s->trackedWrites) { + sw.resetChains(); + } + s->firstChain.reset(); + s->secondChain.reset(); + } + Then we could say check for a null ADM::firstChain and handle it differently - not advancing any iterators etc. Infact we already have some special-case code for a null firstChain, which is designed to handle the ADM -> PDM case: if (!firstChain) { // An ActiveDM _may_ legitimately have no topology information, if // for example it has just been created from a PassiveDM during takeover // and ns_server has not yet updated the VBucket's topology. // In this case, it may be possible to update the HPS and we should do // so to ensure that any subsequent state change back to // replica/PassiveDM acks correctly if we never got a topology. We can // update the highPreparedSeqno for anything that the PDM completed // (we should have nothing in trackedWrites not completed as we have no // topology) by using the store value instead of the iterator. Given // we only keep these completed SyncWrites in trackedWrites to correctly // set the HPS when we DO get a topology, we can remove them once we // have advanced past them. auto itr = trackedWrites.begin(); while (itr != trackedWrites.end()) { if (!itr->isCompleted()) { return ; }   // Don't advance past anything not persisted. auto level = itr->getDurabilityReqs().getLevel(); if ((level == cb::durability::Level::PersistToMajority || level == cb::durability::Level::MajorityAndPersistOnMaster) && static_cast <uint64_t>(itr->getBySeqno()) < adm.vb.getPersistenceSeqno()) { return ; }   highPreparedSeqno = itr->getBySeqno();   auto next = std::next(itr); trackedWrites.erase(itr); itr = next; } return ; } However we cannot blindly use this code, as the items in trackedWrites may not be complete - consider the original scenario from the test for MB-51606 (PersistenceForDeadVBucketNotifiesDurabilityMonitor) where there is still a uncompleted SyncWrite in trackedWrites at the point the vBucket transitions to dead. We would therefore have to modify the above code to handle this slightly different case when the vBucket is dead - such as check what the vBucket's state is - but this is starting to feel not very clean - we have an ADM which we are advancing state on when the vBucket is not active... Returning to option (2), we could instead reinstate the chain when we unresolve items from resolvedQueue back into trackedWrites during the state transition to dead - see unresolveCompletedSyncWriteQueue(). That would make the restored ActiveSyncWrites similar to the ones which were already in tracedWrites (and incomplete): @@ -613,6 +613,14 @@ void ActiveDurabilityMonitor::unresolveCompletedSyncWriteQueue() { // Second, whack them back into trackedWrites. The container should be in // seqno order so we will just put them at the front of trackedWrites. auto s = state.wlock(); + + // SyncWrites in resolvedQueue have had their chain information cleared + // (via resetChains in removeSyncWrite). Restore the current chain + // information so we can correctly process any acks (e.g. local ack from + // persistence) now they are back in trackedWrites. + for (auto& sw : writesToTrack) { + sw.resetTopology(*s->firstChain, s->secondChain.get()); + } This makes the ActiveSyncWrites consistent with respect to the ADM - all ActiveSyncWrites in trackeWrites have the chains from the ADM. This would allow us to "safely" ack ActiveSyncWrites. However, we are now changing even more state of the ADM when the vBucket is active; including generating Commit / Abort messages (unless we special-case)... Overall it's not clear to me what is the best way to proceed. Thinking back to the changes for the original MB-51606 , in hindsight I'm not sure if we should be allowing persistance notifications to be performed on the ADM - at least not in the current form where all sorts of state is changed. As such, I want to wait untli Ben is available to discuss this further. CC Paolo Cocchi .
            drigby Dave Rigby added a comment -

            As per previous comments, the fix for MB-51606 which triggered this issue has been reverted. This issue should not be resolved.

            drigby Dave Rigby added a comment - As per previous comments, the fix for MB-51606 which triggered this issue has been reverted. This issue should not be resolved.

            Closing based on the weekly run on build 7.1.1-3135

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Closing based on the weekly run on build 7.1.1-3135

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty