Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35060

[Durability]: Rollback of SyncWrite-related operations can result in invalid state on replicas [2019/7/19]

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Operating System:
      Centos 64-bit
    • Is this a Regression?:
      No
    • Sprint:
      KV-Engine Mad-Hatter Beta

      Description

       

      Build: 6.5.0-3748

      Scenario:

      1. 4 node cluster, Couchbase bucket (replicas=3)
      2. Initially load 10 docs with durability=MAJORITY (Succeeds)
      3. Started loading 10K docs in background
      4. Kill Memcached on node-4 (10.112.191.104), when SyncWrite creates are happening in parallel

      Observation:

      Memcached on node-4 recovers, but still all the docs ops fails with `com.couchbase.client.core.error.RequestTimeoutException` exception.

      Once the step-3 is completed with errors, trying to retry the fails docs one-by-one using the separate SDK client with sdk_timeout=5 seconds (Same as step-2).

      Seeing `com.couchbase.client.core.error.DurabilityAmbiguousException` for all individual operation, but the doc_creation succeeds. After this memcached on replica nodes (node-2, node-3) crashes at the same time with exit code #139 with back-trace given below. node-1 crashes 16s later (05:19:43.384)
      Expected behavior:

      As soon as the node-4 restarts, all sync_writes should go through fine without any timeouts / ambiguous exceptions. No Memcached crash is expected when writing to replica nodes.

      2019-07-13T05:19:27.211-07:00, ns_log:0:info:message(ns_1@10.112.191.103) - Service 'memcached' exited with status 139. Restarting. Messages:
      2019-07-13T05:19:27.183050-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x8f300]
      2019-07-13T05:19:27.183054-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x8f3ab]
      2019-07-13T05:19:27.183059-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x9bf4c]
      2019-07-13T05:19:27.183065-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x30557]
      2019-07-13T05:19:27.183068-07:00 CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8() [0x7fabf4202000+0x17107]
      2019-07-13T05:19:27.183071-07:00 CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8(event_base_loop+0x39f) [0x7fabf4202000+0x1767f]
      2019-07-13T05:19:27.183077-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x701a9]
      2019-07-13T05:19:27.183079-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fabf4f23000+0x8f27]
      2019-07-13T05:19:27.183083-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fabf2953000+0x7df3]
      2019-07-13T05:19:27.183107-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fabf2592000+0xf61ad]
      

      Bucket state after the test execution:

       

      TAF Testcase:

       

      crash_test.crash_process.CrashTest:
        test_crash_process,nodes_init=4,replicas=3,num_items=10,process=memcached,service=memcached,sig_type=sigkill,target_node=active,durability=MAJORITY

       

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            owend Daniel Owen added a comment -

            Hey Dave Rigby Would you mind taking a look at his one?

            thanks

            Show
            owend Daniel Owen added a comment - Hey Dave Rigby Would you mind taking a look at his one? thanks
            Hide
            drigby Dave Rigby added a comment - - edited

            Backtrace from log from node-3 below. Note there's no other errors in the logfile before the crash.

            2019-07-13T05:19:27.182627-07:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.0-3748). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/76d569f2-fecd-c4b5-51cf5cd1-37abcffa.dmp before terminating.
            2019-07-13T05:19:27.182778-07:00 CRITICAL Stack backtrace of crashed thread:
            2019-07-13T05:19:27.182939-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x1339c0]
            2019-07-13T05:19:27.182950-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14aa9e]
            2019-07-13T05:19:27.182957-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14adb4]
            2019-07-13T05:19:27.182962-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fabf2953000+0xf130]
            2019-07-13T05:19:27.182971-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xdd350]
            2019-07-13T05:19:27.182977-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xe03d4]
            2019-07-13T05:19:27.182982-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xe1417]
            2019-07-13T05:19:27.182987-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb2093]
            2019-07-13T05:19:27.182992-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb44b6]
            2019-07-13T05:19:27.182996-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb4618]
            2019-07-13T05:19:27.182999-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb7323]
            2019-07-13T05:19:27.183002-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x98a3f]
            2019-07-13T05:19:27.183005-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x98c7a]
            2019-07-13T05:19:27.183008-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x9c454]
            2019-07-13T05:19:27.183012-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x10c681]
            2019-07-13T05:19:27.183019-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x5f433]
            2019-07-13T05:19:27.183026-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0xba895]
            2019-07-13T05:19:27.183033-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0xaebcf]
            2019-07-13T05:19:27.183039-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x9d06c]
            2019-07-13T05:19:27.183044-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0xceaa8]
            2019-07-13T05:19:27.183050-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x8f300]
            2019-07-13T05:19:27.183054-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x8f3ab]
            2019-07-13T05:19:27.183059-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x9bf4c]
            2019-07-13T05:19:27.183065-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x30557]
            2019-07-13T05:19:27.183068-07:00 CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8() [0x7fabf4202000+0x17107]
            2019-07-13T05:19:27.183071-07:00 CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8(event_base_loop+0x39f) [0x7fabf4202000+0x1767f]
            2019-07-13T05:19:27.183077-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x701a9]
            2019-07-13T05:19:27.183079-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fabf4f23000+0x8f27]
            2019-07-13T05:19:27.183083-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fabf2953000+0x7df3]
            2019-07-13T05:19:27.183107-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fabf2592000+0xf61ad]
            2019-07-13T05:19:27.183437-07:00 INFO ---------- Closing logfile
            

            Show
            drigby Dave Rigby added a comment - - edited Backtrace from log from node-3 below. Note there's no other errors in the logfile before the crash. 2019-07-13T05:19:27.182627-07:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.0-3748). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/76d569f2-fecd-c4b5-51cf5cd1-37abcffa.dmp before terminating. 2019-07-13T05:19:27.182778-07:00 CRITICAL Stack backtrace of crashed thread: 2019-07-13T05:19:27.182939-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x1339c0] 2019-07-13T05:19:27.182950-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14aa9e] 2019-07-13T05:19:27.182957-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14adb4] 2019-07-13T05:19:27.182962-07:00 CRITICAL /lib64/libpthread.so.0() [0x7fabf2953000+0xf130] 2019-07-13T05:19:27.182971-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xdd350] 2019-07-13T05:19:27.182977-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xe03d4] 2019-07-13T05:19:27.182982-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xe1417] 2019-07-13T05:19:27.182987-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb2093] 2019-07-13T05:19:27.182992-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb44b6] 2019-07-13T05:19:27.182996-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb4618] 2019-07-13T05:19:27.182999-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0xb7323] 2019-07-13T05:19:27.183002-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x98a3f] 2019-07-13T05:19:27.183005-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x98c7a] 2019-07-13T05:19:27.183008-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x9c454] 2019-07-13T05:19:27.183012-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fabedd75000+0x10c681] 2019-07-13T05:19:27.183019-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x5f433] 2019-07-13T05:19:27.183026-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xba895] 2019-07-13T05:19:27.183033-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xaebcf] 2019-07-13T05:19:27.183039-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x9d06c] 2019-07-13T05:19:27.183044-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xceaa8] 2019-07-13T05:19:27.183050-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x8f300] 2019-07-13T05:19:27.183054-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x8f3ab] 2019-07-13T05:19:27.183059-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x9bf4c] 2019-07-13T05:19:27.183065-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x30557] 2019-07-13T05:19:27.183068-07:00 CRITICAL /opt/couchbase/bin/../lib/libevent_core.so.2.1.8() [0x7fabf4202000+0x17107] 2019-07-13T05:19:27.183071-07:00 CRITICAL /opt/couchbase/bin/../lib/libevent_core.so.2.1.8(event_base_loop+0x39f) [0x7fabf4202000+0x1767f] 2019-07-13T05:19:27.183077-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x701a9] 2019-07-13T05:19:27.183079-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fabf4f23000+0x8f27] 2019-07-13T05:19:27.183083-07:00 CRITICAL /lib64/libpthread.so.0() [0x7fabf2953000+0x7df3] 2019-07-13T05:19:27.183107-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7fabf2592000+0xf61ad] 2019-07-13T05:19:27.183437-07:00 INFO ---------- Closing logfile
            Hide
            drigby Dave Rigby added a comment - - edited

            Full backtrace from minidump:

            (gdb) bt full
            #0  load (__m=std::memory_order_seq_cst, this=<optimized out>) at /usr/local/include/c++/7.3.0/bits/atomic_base.h:396
            No locals.
            #1  getBySeqno (this=<optimized out>) at kv_engine/engines/ep/src/item.h:177
            No locals.
            #2  DurabilityMonitor::SyncWrite::getBySeqno (this=this@entry=0x10) at kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:64
            No locals.
            #3  0x00007fabede553d4 in operator()<std::_List_iterator<DurabilityMonitor::SyncWrite> > (prepareItr=..., snapshotEndSeqno=13, __closure=<synthetic pointer>) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:355
            No locals.
            #4  PassiveDurabilityMonitor::State::updateHighPreparedSeqno (this=0x7fabe6098d80) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:383
                    next = {_M_node = 0x0}
                    snapshotEndSeqno = 13
                    maxLevelCanAdvanceOver = cb::durability::MajorityAndPersistOnMaster
                    prevHPS = {<ThrowExceptionPolicy<long>> = {<No data fields>}, val = 9, label = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fabeb63f9b8 "unlabelled"}, _M_string_length = 10, {
                          _M_local_buf = "unlabelled\000\356\253\177\000", _M_allocated_capacity = 7812730926796664437}}}
            #5  0x00007fabede56417 in PassiveDurabilityMonitor::notifySnapshotEndReceived (this=0x7fabe5fc2ac0, snapEnd=<optimized out>, snapEnd@entry=13) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:138
                    s = {
                      holder_ = {<folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<PassiveDurabilityMonitor::State, std::default_delete<PassiveDurabilityMonitor::State> >, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7fabe5fc2ad0}, static AllowsConcurrentAccess = false}}
                    prevHps = <optimized out>
                    hps = 0
            #6  0x00007fabedeffe65 in VBucket::notifyPassiveDMOfSnapEndReceived (this=<optimized out>, snapEnd=snapEnd@entry=13) at kv_engine/engines/ep/src/vbucket.cc:973
            No locals.
            #7  0x00007fabede27093 in PassiveStream::handleSnapshotEnd (this=this@entry=0x7fabb9f4ea90, vb=..., byseqno=13) at kv_engine/engines/ep/src/dcp/passive_stream.cc:959
            No locals.
            #8  0x00007fabede294b6 in PassiveStream::processMessage (this=this@entry=0x7fabb9f4ea90, message=message@entry=0x7fabbd0d4cb0, messageType=messageType@entry=PassiveStream::Prepare) at kv_engine/engines/ep/src/dcp/passive_stream.cc:647
                    taskToString = {0x7fabedff2f8b "mutation", 0x7fabedff2f94 "deletion", 0x7fabedff2f9d "expiration", 0x7fabedff2fa8 "prepare"}
                    vb = {<std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<VBucket, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabe5f51200, _M_refcount = {_M_pi = 0x7fabe5fc2ae0}}, <No data fields>}
                    consumer = {<std::__shared_ptr<DcpConsumer, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<DcpConsumer, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabe5f97810, _M_refcount = {_M_pi = 0x7fabe5f97800}}, <No data fields>}
                    ret = <optimized out>
                    deleteSource = <optimized out>
                    switchComplete = <optimized out>
            #9  0x00007fabede29618 in PassiveStream::processPrepare (this=this@entry=0x7fabb9f4ea90, prepare=0x7fabbd0d4cb0) at kv_engine/engines/ep/src/dcp/passive_stream.cc:670
                    result = <optimized out>
            #10 0x00007fabede2c323 in PassiveStream::messageReceived (this=0x7fabb9f4ea90, dcpResponse=...) at kv_engine/engines/ep/src/dcp/passive_stream.cc:296
                    ret = ENGINE_SUCCESS
                    seqno = <optimized out>
            #11 0x00007fabede0da3f in DcpConsumer::lookupStreamAndDispatchMessage (this=this@entry=0x7fabe5f97810, ufc=..., vbucket=vbucket@entry=..., opaque=opaque@entry=216, msg=...) at kv_engine/engines/ep/src/dcp/consumer.cc:1669
                    stream = {<std::__shared_ptr<PassiveStream, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<PassiveStream, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabb9f4ea90, _M_refcount = {_M_pi = 0x7fabb9f4ea80}}, <No data fields>}
                    err = <optimized out>
            #12 0x00007fabede0dc7a in DcpConsumer::processMutationOrPrepare (this=this@entry=0x7fabe5f97810, vbucket=..., opaque=opaque@entry=216, key=..., item=..., meta=..., msgBytes=<optimized out>) at kv_engine/engines/ep/src/dcp/consumer.cc:440
                    ufc = {consumer = @0x7fabe5f97810, bytes = 127}
                    emd = {_M_t = {
                        _M_t = {<std::_Tuple_impl<0, ExtendedMetaData*, std::default_delete<ExtendedMetaData> >> = {<std::_Tuple_impl<1, std::default_delete<ExtendedMetaData> >> = {<std::_Head_base<1, std::default_delete<ExtendedMetaData>, true>> = {<std::default_delete<ExtendedMetaData>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, ExtendedMetaData*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}
            #13 0x00007fabede11454 in DcpConsumer::prepare (this=0x7fabe5f97810, opaque=216, key=..., value=..., priv_bytes=<optimized out>, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...)
                at kv_engine/engines/ep/src/dcp/consumer.cc:1647
                    item = {value = 0x7fabbdac8610}
                    msgBytes = <optimized out>
            #14 0x00007fabede81681 in EventuallyPersistentEngine::prepare (this=0x7fabe5af1400, cookie=..., opaque=216, key=..., value=..., priv_bytes=0, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...)
                at kv_engine/engines/ep/src/ep_engine.cc:1608
                    engine = {_M_t = {
                        _M_t = {<std::_Tuple_impl<0, EventuallyPersistentEngine*, EPHandleReleaser>> = {<std::_Tuple_impl<1, EPHandleReleaser>> = {<std::_Head_base<1, EPHandleReleaser, true>> = {<EPHandleReleaser> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, EventuallyPersistentEngine*, false>> = {
                              _M_head_impl = 0x7fabe5af1400}, <No data fields>}, <No data fields>}}}
                    conn = <optimized out>
            #15 0x000000000045f433 in dcpPrepare (cookie=..., opaque=216, key=..., value=..., priv_bytes=priv_bytes@entry=0, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...)
                at kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:733
                    dcp = <optimized out>
                    ret = <optimized out>
            #16 0x00000000004ba895 in dcp_prepare_executor (cookie=...) at kv_engine/daemon/protocol/mcbp/dcp_prepare_executor.cc:51
                    extdata = <optimized out>
                    value = {static npos = <optimized out>, buf = 0x7fabe632534c <Address 0x7fabe632534c out of bounds>, len = 55}
                    req = @0x7fabe6325304: <error reading variable>
                    priv_bytes = <optimized out>
                    ret = <optimized out>
            #17 0x00000000004aebcf in operator() (__args#0=..., this=<optimized out>) at /usr/local/include/c++/7.3.0/bits/std_function.h:706
            No locals.
            #18 execute_client_request_packet (cookie=..., request=...) at kv_engine/daemon/mcbp_executors.cc:850
                    privilegeChains = {commandChains = {_M_elems = {{chain = {<std::_Vector_base<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&>, std::allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> > >> = {
                                _M_impl = {<std::allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> >> = {<__gnu_cxx::new_allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> >> = {<No data fields>}, <No data fields>}, _M_start = 0x0, _M_finish = 0x0, 
                                  _M_end_of_storage = 0x0}}, <No data fields>}} <repeats 256 times>}}}
                    res = cb::rbac::Ok
            #19 0x000000000049d06c in Cookie::execute (this=this@entry=0x7fabe65d5280) at kv_engine/daemon/cookie.cc:124
                    header = @0x7fabe6325304: <error reading variable>
            #20 0x00000000004ceaa8 in StateMachine::conn_execute (this=0x7fabbcfd1178) at kv_engine/daemon/statemachine.cc:410
                    cookie = @0x7fabe65d5280: <error reading variable>
            #21 0x000000000048f300 in Connection::runStateMachinery (this=0x7fabbcfd0f80) at kv_engine/daemon/connection.cc:1340
            No locals.
            #22 0x000000000048f3ab in Connection::runEventLoop (this=this@entry=0x7fabbcfd0f80, which=which@entry=2) at kv_engine/daemon/connection.cc:1414
            No locals.
            #23 0x000000000049bf4c in run_event_loop (c=c@entry=0x7fabbcfd0f80, which=which@entry=2) at kv_engine/daemon/connections.cc:148
                    start = <optimized out>
                    stop = <optimized out>
                    ns = <optimized out>
            #24 0x0000000000430557 in event_handler (fd=<optimized out>, which=<optimized out>, arg=0x7fabbcfd0f80) at kv_engine/daemon/memcached.cc:848
                    c = 0x7fabbcfd0f80
                    phosphor_internal_category_enabled_834 = {_M_b = {_M_p = 0x0}}
                    phosphor_internal_category_enabled_temp_834 = <optimized out>
                    phosphor_internal_tpi_wait_834 = {category = 0x57ddad "mutex", name = 0x57dc00 "event_handler::threadLock.wait", type = phosphor::Complete, argument_names = {_M_elems = {0x57cde3 "this", 0x5970d2 ""}}, argument_types = {_M_elems = {phosphor::is_pointer, phosphor::is_none}}}
                    phosphor_internal_tpi_held_834 = {category = 0x57ddad "mutex", name = 0x57dbe0 "event_handler::threadLock.held", type = phosphor::Complete, argument_names = {_M_elems = {0x5970d2 "", 0x5970d2 ""}}, argument_types = {_M_elems = {phosphor::is_pointer, phosphor::is_none}}}
                    phosphor_internal_guard_834 = {tpiWait = 0x580320 <event_handler(int, short, void*)::phosphor_internal_tpi_wait_834>, tpiHeld = 0x5802e0 <event_handler(int, short, void*)::phosphor_internal_tpi_held_834>, enabled = true, mutex = @0x7fabf1f16238, threshold = {__r = 10000000}, start = {__d = {__r = 143487508171154}}, 
                      lockedAt = {__d = {__r = 143487508171206}}, releasedAt = {__d = {__r = 0}}}
            #25 0x00007fabf4219107 in event_persist_closure (ev=<optimized out>, base=0x2857fd0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580
                    evcb_callback = 0x430390 <event_handler(int, short, void*)>
                    evcb_fd = 51
                    evcb_res = 2
                    evcb_arg = 0x7fabbcfd0f80
            #26 event_process_active_single_queue (base=base@entry=0x2857fd0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0, activeq=<optimized out>) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639
                    ev = <optimized out>
                    evcb = <optimized out>
                    count = 1
            #27 0x00007fabf421967f in event_process_active (base=0x2857fd0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1738
                    activeq = <optimized out>
                    i = 0
                    c = 0
                    tv = {tv_sec = 140376442127984, tv_usec = 140376486064735}
                    maxcb = 2147483647
                    endtime = 0x0
                    limit_after_prio = 2147483647
            #28 event_base_loop (base=0x2857fd0, flags=flags@entry=0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1961
                    evsel = 0x7fabf442ebc0 <epollops>
                    tv = {tv_sec = 140376365344368, tv_usec = 0}
                    tv_p = <optimized out>
                    res = <optimized out>
                    done = 0
                    retval = 0
            #29 0x00000000004701a9 in worker_libevent (arg=0x7fabf1f16160) at kv_engine/daemon/thread.cc:218
                    me = @0x7fabf1f16160: <error reading variable>
            #30 0x00007fabf4f2bf27 in run (this=0x7fabf1f38cd0) at platform/src/cb_pthreads.cc:58
            No locals.
            #31 platform_thread_wrap (arg=0x7fabf1f38cd0) at platform/src/cb_pthreads.cc:71
                    context = {_M_t = {
                        _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7fabf1f38cd0}, <No data fields>}, <No data fields>}}}
            #32 0x00007fabf295adf3 in start_thread () from /usr/lib64/libpthread-2.17.so
            No symbol table info available.
            #33 0x00007fabf26881ad in clone () from /usr/lib64/libc-2.17.so
            No symbol table info available.
            

            Issue is within PassiveDurabilityMonitor::State::updateHighPreparedSeqno, which has called SyncWrite::getBySeqno() on a null SyncWrite object - note the this value of 0x10 at frame #2:

            #2  DurabilityMonitor::SyncWrite::getBySeqno (this=this@entry=0x10) at kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:64
            No locals.
            #3  0x00007fabede553d4 in operator()<std::_List_iterator<DurabilityMonitor::SyncWrite> > (prepareItr=..., snapshotEndSeqno=13, __closure=<synthetic pointer>) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:355
            No locals.
            #4  PassiveDurabilityMonitor::State::updateHighPreparedSeqno (this=0x7fabe6098d80) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:383
                    next = {_M_node = 0x0}
                    snapshotEndSeqno = 13
                    maxLevelCanAdvanceOver = cb::durability::MajorityAndPersistOnMaster
                    prevHPS = {<ThrowExceptionPolicy<long>> = {<No data fields>}, val = 9, label = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fabeb63f9b8 "unlabelled"}, _M_string_length = 10, {
                          _M_local_buf = "unlabelled\000\356\253\177\000", _M_allocated_capacity = 7812730926796664437}}}
            

            Note frame #4 (updateHighPreparedSeqno) line 383 where inSnapshot - line 355 is called:

                    for (auto next = getIteratorNext(highPreparedSeqno.it);
                         inSnapshot(snapshotEndSeqno, next) &&
                         next->getDurabilityReqs().getLevel() <= maxLevelCanAdvanceOver;
            

            It would appear that prepareItr - i.e. next in {{updateHighPreparedSeqno }} is null:

                // Helper to keep conditions short and meaningful
                const auto inSnapshot = [trackedWritesEnd = trackedWrites.end()](
                                                uint64_t snapshotEndSeqno,
                                                auto prepareItr) {
                    return prepareItr != trackedWritesEnd &&
                           static_cast<uint64_t>(prepareItr->getBySeqno()) <=
                                   snapshotEndSeqno;
                };
            

            Indeed this matches the GDB local variables at frame #4 above.

            Show
            drigby Dave Rigby added a comment - - edited Full backtrace from minidump: (gdb) bt full #0 load (__m=std::memory_order_seq_cst, this=<optimized out>) at /usr/local/include/c++/7.3.0/bits/atomic_base.h:396 No locals. #1 getBySeqno (this=<optimized out>) at kv_engine/engines/ep/src/item.h:177 No locals. #2 DurabilityMonitor::SyncWrite::getBySeqno (this=this@entry=0x10) at kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:64 No locals. #3 0x00007fabede553d4 in operator()<std::_List_iterator<DurabilityMonitor::SyncWrite> > (prepareItr=..., snapshotEndSeqno=13, __closure=<synthetic pointer>) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:355 No locals. #4 PassiveDurabilityMonitor::State::updateHighPreparedSeqno (this=0x7fabe6098d80) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:383 next = {_M_node = 0x0} snapshotEndSeqno = 13 maxLevelCanAdvanceOver = cb::durability::MajorityAndPersistOnMaster prevHPS = {<ThrowExceptionPolicy<long>> = {<No data fields>}, val = 9, label = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fabeb63f9b8 "unlabelled"}, _M_string_length = 10, { _M_local_buf = "unlabelled\000\356\253\177\000", _M_allocated_capacity = 7812730926796664437}}} #5 0x00007fabede56417 in PassiveDurabilityMonitor::notifySnapshotEndReceived (this=0x7fabe5fc2ac0, snapEnd=<optimized out>, snapEnd@entry=13) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:138 s = { holder_ = {<folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<PassiveDurabilityMonitor::State, std::default_delete<PassiveDurabilityMonitor::State> >, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7fabe5fc2ad0}, static AllowsConcurrentAccess = false}} prevHps = <optimized out> hps = 0 #6 0x00007fabedeffe65 in VBucket::notifyPassiveDMOfSnapEndReceived (this=<optimized out>, snapEnd=snapEnd@entry=13) at kv_engine/engines/ep/src/vbucket.cc:973 No locals. #7 0x00007fabede27093 in PassiveStream::handleSnapshotEnd (this=this@entry=0x7fabb9f4ea90, vb=..., byseqno=13) at kv_engine/engines/ep/src/dcp/passive_stream.cc:959 No locals. #8 0x00007fabede294b6 in PassiveStream::processMessage (this=this@entry=0x7fabb9f4ea90, message=message@entry=0x7fabbd0d4cb0, messageType=messageType@entry=PassiveStream::Prepare) at kv_engine/engines/ep/src/dcp/passive_stream.cc:647 taskToString = {0x7fabedff2f8b "mutation", 0x7fabedff2f94 "deletion", 0x7fabedff2f9d "expiration", 0x7fabedff2fa8 "prepare"} vb = {<std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<VBucket, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabe5f51200, _M_refcount = {_M_pi = 0x7fabe5fc2ae0}}, <No data fields>} consumer = {<std::__shared_ptr<DcpConsumer, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<DcpConsumer, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabe5f97810, _M_refcount = {_M_pi = 0x7fabe5f97800}}, <No data fields>} ret = <optimized out> deleteSource = <optimized out> switchComplete = <optimized out> #9 0x00007fabede29618 in PassiveStream::processPrepare (this=this@entry=0x7fabb9f4ea90, prepare=0x7fabbd0d4cb0) at kv_engine/engines/ep/src/dcp/passive_stream.cc:670 result = <optimized out> #10 0x00007fabede2c323 in PassiveStream::messageReceived (this=0x7fabb9f4ea90, dcpResponse=...) at kv_engine/engines/ep/src/dcp/passive_stream.cc:296 ret = ENGINE_SUCCESS seqno = <optimized out> #11 0x00007fabede0da3f in DcpConsumer::lookupStreamAndDispatchMessage (this=this@entry=0x7fabe5f97810, ufc=..., vbucket=vbucket@entry=..., opaque=opaque@entry=216, msg=...) at kv_engine/engines/ep/src/dcp/consumer.cc:1669 stream = {<std::__shared_ptr<PassiveStream, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<PassiveStream, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fabb9f4ea90, _M_refcount = {_M_pi = 0x7fabb9f4ea80}}, <No data fields>} err = <optimized out> #12 0x00007fabede0dc7a in DcpConsumer::processMutationOrPrepare (this=this@entry=0x7fabe5f97810, vbucket=..., opaque=opaque@entry=216, key=..., item=..., meta=..., msgBytes=<optimized out>) at kv_engine/engines/ep/src/dcp/consumer.cc:440 ufc = {consumer = @0x7fabe5f97810, bytes = 127} emd = {_M_t = { _M_t = {<std::_Tuple_impl<0, ExtendedMetaData*, std::default_delete<ExtendedMetaData> >> = {<std::_Tuple_impl<1, std::default_delete<ExtendedMetaData> >> = {<std::_Head_base<1, std::default_delete<ExtendedMetaData>, true>> = {<std::default_delete<ExtendedMetaData>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, ExtendedMetaData*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}} #13 0x00007fabede11454 in DcpConsumer::prepare (this=0x7fabe5f97810, opaque=216, key=..., value=..., priv_bytes=<optimized out>, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...) at kv_engine/engines/ep/src/dcp/consumer.cc:1647 item = {value = 0x7fabbdac8610} msgBytes = <optimized out> #14 0x00007fabede81681 in EventuallyPersistentEngine::prepare (this=0x7fabe5af1400, cookie=..., opaque=216, key=..., value=..., priv_bytes=0, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...) at kv_engine/engines/ep/src/ep_engine.cc:1608 engine = {_M_t = { _M_t = {<std::_Tuple_impl<0, EventuallyPersistentEngine*, EPHandleReleaser>> = {<std::_Tuple_impl<1, EPHandleReleaser>> = {<std::_Head_base<1, EPHandleReleaser, true>> = {<EPHandleReleaser> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, EventuallyPersistentEngine*, false>> = { _M_head_impl = 0x7fabe5af1400}, <No data fields>}, <No data fields>}}} conn = <optimized out> #15 0x000000000045f433 in dcpPrepare (cookie=..., opaque=216, key=..., value=..., priv_bytes=priv_bytes@entry=0, datatype=3 '\003', cas=1563020366701395968, vbucket=..., flags=2, by_seqno=13, rev_seqno=1, expiration=0, lock_time=0, nru=0 '\000', document_state=Alive, durability=...) at kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:733 dcp = <optimized out> ret = <optimized out> #16 0x00000000004ba895 in dcp_prepare_executor (cookie=...) at kv_engine/daemon/protocol/mcbp/dcp_prepare_executor.cc:51 extdata = <optimized out> value = {static npos = <optimized out>, buf = 0x7fabe632534c <Address 0x7fabe632534c out of bounds>, len = 55} req = @0x7fabe6325304: <error reading variable> priv_bytes = <optimized out> ret = <optimized out> #17 0x00000000004aebcf in operator() (__args#0=..., this=<optimized out>) at /usr/local/include/c++/7.3.0/bits/std_function.h:706 No locals. #18 execute_client_request_packet (cookie=..., request=...) at kv_engine/daemon/mcbp_executors.cc:850 privilegeChains = {commandChains = {_M_elems = {{chain = {<std::_Vector_base<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&>, std::allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> > >> = { _M_impl = {<std::allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> >> = {<__gnu_cxx::new_allocator<Function<cb::rbac::PrivilegeAccess, (cb::rbac::PrivilegeAccess)0, Cookie&> >> = {<No data fields>}, <No data fields>}, _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage = 0x0}}, <No data fields>}} <repeats 256 times>}}} res = cb::rbac::Ok #19 0x000000000049d06c in Cookie::execute (this=this@entry=0x7fabe65d5280) at kv_engine/daemon/cookie.cc:124 header = @0x7fabe6325304: <error reading variable> #20 0x00000000004ceaa8 in StateMachine::conn_execute (this=0x7fabbcfd1178) at kv_engine/daemon/statemachine.cc:410 cookie = @0x7fabe65d5280: <error reading variable> #21 0x000000000048f300 in Connection::runStateMachinery (this=0x7fabbcfd0f80) at kv_engine/daemon/connection.cc:1340 No locals. #22 0x000000000048f3ab in Connection::runEventLoop (this=this@entry=0x7fabbcfd0f80, which=which@entry=2) at kv_engine/daemon/connection.cc:1414 No locals. #23 0x000000000049bf4c in run_event_loop (c=c@entry=0x7fabbcfd0f80, which=which@entry=2) at kv_engine/daemon/connections.cc:148 start = <optimized out> stop = <optimized out> ns = <optimized out> #24 0x0000000000430557 in event_handler (fd=<optimized out>, which=<optimized out>, arg=0x7fabbcfd0f80) at kv_engine/daemon/memcached.cc:848 c = 0x7fabbcfd0f80 phosphor_internal_category_enabled_834 = {_M_b = {_M_p = 0x0}} phosphor_internal_category_enabled_temp_834 = <optimized out> phosphor_internal_tpi_wait_834 = {category = 0x57ddad "mutex", name = 0x57dc00 "event_handler::threadLock.wait", type = phosphor::Complete, argument_names = {_M_elems = {0x57cde3 "this", 0x5970d2 ""}}, argument_types = {_M_elems = {phosphor::is_pointer, phosphor::is_none}}} phosphor_internal_tpi_held_834 = {category = 0x57ddad "mutex", name = 0x57dbe0 "event_handler::threadLock.held", type = phosphor::Complete, argument_names = {_M_elems = {0x5970d2 "", 0x5970d2 ""}}, argument_types = {_M_elems = {phosphor::is_pointer, phosphor::is_none}}} phosphor_internal_guard_834 = {tpiWait = 0x580320 <event_handler(int, short, void*)::phosphor_internal_tpi_wait_834>, tpiHeld = 0x5802e0 <event_handler(int, short, void*)::phosphor_internal_tpi_held_834>, enabled = true, mutex = @0x7fabf1f16238, threshold = {__r = 10000000}, start = {__d = {__r = 143487508171154}}, lockedAt = {__d = {__r = 143487508171206}}, releasedAt = {__d = {__r = 0}}} #25 0x00007fabf4219107 in event_persist_closure (ev=<optimized out>, base=0x2857fd0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 evcb_callback = 0x430390 <event_handler(int, short, void*)> evcb_fd = 51 evcb_res = 2 evcb_arg = 0x7fabbcfd0f80 #26 event_process_active_single_queue (base=base@entry=0x2857fd0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0, activeq=<optimized out>) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 ev = <optimized out> evcb = <optimized out> count = 1 #27 0x00007fabf421967f in event_process_active (base=0x2857fd0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1738 activeq = <optimized out> i = 0 c = 0 tv = {tv_sec = 140376442127984, tv_usec = 140376486064735} maxcb = 2147483647 endtime = 0x0 limit_after_prio = 2147483647 #28 event_base_loop (base=0x2857fd0, flags=flags@entry=0) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1961 evsel = 0x7fabf442ebc0 <epollops> tv = {tv_sec = 140376365344368, tv_usec = 0} tv_p = <optimized out> res = <optimized out> done = 0 retval = 0 #29 0x00000000004701a9 in worker_libevent (arg=0x7fabf1f16160) at kv_engine/daemon/thread.cc:218 me = @0x7fabf1f16160: <error reading variable> #30 0x00007fabf4f2bf27 in run (this=0x7fabf1f38cd0) at platform/src/cb_pthreads.cc:58 No locals. #31 platform_thread_wrap (arg=0x7fabf1f38cd0) at platform/src/cb_pthreads.cc:71 context = {_M_t = { _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7fabf1f38cd0}, <No data fields>}, <No data fields>}}} #32 0x00007fabf295adf3 in start_thread () from /usr/lib64/libpthread-2.17.so No symbol table info available. #33 0x00007fabf26881ad in clone () from /usr/lib64/libc-2.17.so No symbol table info available. Issue is within PassiveDurabilityMonitor::State::updateHighPreparedSeqno , which has called SyncWrite::getBySeqno() on a null SyncWrite object - note the this value of 0x10 at frame #2: #2 DurabilityMonitor::SyncWrite::getBySeqno (this=this@entry=0x10) at kv_engine/engines/ep/src/durability/durability_monitor_impl.cc:64 No locals. #3 0x00007fabede553d4 in operator()<std::_List_iterator<DurabilityMonitor::SyncWrite> > (prepareItr=..., snapshotEndSeqno=13, __closure=<synthetic pointer>) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:355 No locals. #4 PassiveDurabilityMonitor::State::updateHighPreparedSeqno (this=0x7fabe6098d80) at kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:383 next = {_M_node = 0x0} snapshotEndSeqno = 13 maxLevelCanAdvanceOver = cb::durability::MajorityAndPersistOnMaster prevHPS = {<ThrowExceptionPolicy<long>> = {<No data fields>}, val = 9, label = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fabeb63f9b8 "unlabelled"}, _M_string_length = 10, { _M_local_buf = "unlabelled\000\356\253\177\000", _M_allocated_capacity = 7812730926796664437}}} Note frame #4 (updateHighPreparedSeqno) line 383 where inSnapshot - line 355 is called: for (auto next = getIteratorNext(highPreparedSeqno.it); inSnapshot(snapshotEndSeqno, next) && next->getDurabilityReqs().getLevel() <= maxLevelCanAdvanceOver; It would appear that prepareItr - i.e. next in {{updateHighPreparedSeqno }} is null: // Helper to keep conditions short and meaningful const auto inSnapshot = [trackedWritesEnd = trackedWrites.end()]( uint64_t snapshotEndSeqno, auto prepareItr) { return prepareItr != trackedWritesEnd && static_cast <uint64_t>(prepareItr->getBySeqno()) <= snapshotEndSeqno; }; Indeed this matches the GDB local variables at frame #4 above.
            Hide
            drigby Dave Rigby added a comment -

            Shortly before the crash is the following failed Monotonic invariant failure - note there's a large number of these.

            2019-07-13T05:11:12.149814-07:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fabe6212410","ewouldblock":false,"packet":{"bodylen":102,"cas":1563019868065366016,"datatype":["JSON","Snappy"],"extlen":33,"key":"<ud>.test_docs-4227</ud>","keylen":15,"magic":"ClientRequest","opaque":3,"opcode":"DCP_PREPARE","vbucket":770},"refcount":1}] - closing connection ([ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): Monotonic<m> (unlabelled) invariant failed: new value (17) breaks invariant on current value (17)
            

            This error is preceded by a rollback attempt on this vBucket; which suspiciously is rolling back to the same seqno which reported a Monotonic error:

            2019-07-13T05:10:23.998797-07:00 INFO (default) VBucket::setState: transitioning vb:770 from:replica to:replica
            2019-07-13T05:10:27.115006-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to add stream: opaque_:3, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:168046390496995, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"}
            2019-07-13T05:11:09.883649-07:00 WARNING 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Setting stream to dead state, last_seqno is 18, unAckedBytes is 0, status is The stream closed early because the conn was disconnected
            2019-07-13T05:11:11.909563-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to add stream: opaque_:3, start_seqno_:18, end_seqno_:18446744073709551615, vb_uuid:33874532304853, snap_start_seqno_:18, snap_end_seqno_:18, last_seqno:18, stream_req_value:{"uid":"0"}
            2019-07-13T05:11:11.961097-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Received rollback request. Rolling back to seqno:17
            2019-07-13T05:11:11.997665-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to reconnect stream with opaque 3, start seq no 16, end seq no 18446744073709551615, snap start seqno 16, snap end seqno 16, and vb manifest uid {"uid":"0"}
            2019-07-13T05:11:12.149814-07:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fabe6212410","ewouldblock":false,"packet":{"bodylen":102,"cas":1563019868065366016,"datatype":["JSON","Snappy"],"extlen":33,"key":"<ud>.test_docs-4227</ud>","keylen":15,"magic":"ClientRequest","opaque":3,"opcode":"DCP_PREPARE","vbucket":770},"refcount":1}] - closing connection ([ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): Monotonic<m> (unlabelled) invariant failed: new value (17) breaks invariant on current value (17)
            

            Show
            drigby Dave Rigby added a comment - Shortly before the crash is the following failed Monotonic invariant failure - note there's a large number of these. 2019-07-13T05:11:12.149814-07:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fabe6212410","ewouldblock":false,"packet":{"bodylen":102,"cas":1563019868065366016,"datatype":["JSON","Snappy"],"extlen":33,"key":"<ud>.test_docs-4227</ud>","keylen":15,"magic":"ClientRequest","opaque":3,"opcode":"DCP_PREPARE","vbucket":770},"refcount":1}] - closing connection ([ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): Monotonic<m> (unlabelled) invariant failed: new value (17) breaks invariant on current value (17) This error is preceded by a rollback attempt on this vBucket; which suspiciously is rolling back to the same seqno which reported a Monotonic error: 2019-07-13T05:10:23.998797-07:00 INFO (default) VBucket::setState: transitioning vb:770 from:replica to:replica 2019-07-13T05:10:27.115006-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to add stream: opaque_:3, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:168046390496995, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"} 2019-07-13T05:11:09.883649-07:00 WARNING 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Setting stream to dead state, last_seqno is 18, unAckedBytes is 0, status is The stream closed early because the conn was disconnected 2019-07-13T05:11:11.909563-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to add stream: opaque_:3, start_seqno_:18, end_seqno_:18446744073709551615, vb_uuid:33874532304853, snap_start_seqno_:18, snap_end_seqno_:18, last_seqno:18, stream_req_value:{"uid":"0"} 2019-07-13T05:11:11.961097-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Received rollback request. Rolling back to seqno:17 2019-07-13T05:11:11.997665-07:00 INFO 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@10.112.191.104->ns_1@10.112.191.103:default - (vb:770) Attempting to reconnect stream with opaque 3, start seq no 16, end seq no 18446744073709551615, snap start seqno 16, snap end seqno 16, and vb manifest uid {"uid":"0"} 2019-07-13T05:11:12.149814-07:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fabe6212410","ewouldblock":false,"packet":{"bodylen":102,"cas":1563019868065366016,"datatype":["JSON","Snappy"],"extlen":33,"key":"<ud>.test_docs-4227</ud>","keylen":15,"magic":"ClientRequest","opaque":3,"opcode":"DCP_PREPARE","vbucket":770},"refcount":1}] - closing connection ([ 127.0.0.1:50858 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): Monotonic<m> (unlabelled) invariant failed: new value (17) breaks invariant on current value (17)
            Hide
            drigby Dave Rigby added a comment -

            Identified some combinations of Prepared / Committed / Aborted items which are not correctly handled over rollback. This results in the wrong items being removed from memory during rollback (Committed item instead of Prepared); which subsequently causes the state in PassiveDM to be invalid; hence the observed exceptions.

            Show
            drigby Dave Rigby added a comment - Identified some combinations of Prepared / Committed / Aborted items which are not correctly handled over rollback. This results in the wrong items being removed from memory during rollback (Committed item instead of Prepared); which subsequently causes the state in PassiveDM to be invalid; hence the observed exceptions.
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-3802 contains kv_engine commit d5c9410 with commit message:
            MB-35060: (Temporarily) rollback to zero if SyncWrites present

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3802 contains kv_engine commit d5c9410 with commit message: MB-35060 : (Temporarily) rollback to zero if SyncWrites present
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Verified on 6.5.0 build 3866.

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Verified on 6.5.0 build 3866.

              People

              • Assignee:
                ashwin.govindarajulu Ashwin Govindarajulu
                Reporter:
                ashwin.govindarajulu Ashwin Govindarajulu
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.