Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37204

[Jepsen] Couchbase crash during hard-reboot scenario due to HCS > persistedCompletedSeqno

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 6.5.0
    • None
    • couchbase-bucket
    • None
    • Triaged
    • Yes
    • KV Sprint 2019-12

    Description

      During kv-engine-jepsen-nightly-352 we observed a and crash during the following test:
      lein trampoline run test --nodes-file ./nodes --username vagrant --ssh-private-key ./resources/vagrantkey --workload=kill --scenario=hard-reboot --replicas=2 --no-autofailover --durability=0:0:0:100 --eviction-policy=full --cas
      Crash Message:

      2019-12-08T01:36:51.777389+00:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620
      

      ep_bucket.cc:620

                      if (hcs) {
                          Expects(hcs > vbstate.persistedCompletedSeqno);
                          vbstate.persistedCompletedSeqno = *hcs;
                      }
      

      Couchbase version 6.5.1-6001

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-37204
          # Subject Branch Project Status CR V

          Activity

            richard.demellow Richard deMellow created issue -
            richard.demellow Richard deMellow made changes -
            Field Original Value New Value
            Summary [Jepsen] Crash [Jepsen] Crash during hard-reboot scenario
            richard.demellow Richard deMellow made changes -
            Remote Link This issue links to "20191207T181623.000-0800.zip (Web Link)" [ 18944 ]
            richard.demellow Richard deMellow made changes -
            Summary [Jepsen] Crash during hard-reboot scenario [Jepsen] Crashes during hard-reboot scenario
            richard.demellow Richard deMellow made changes -
            Description During [kv-engine-jepsen-nightly-352|http://cv.jenkins.couchbase.com/view/Commit%20Validation/view/Jepsen/job/kv-engine-jepsen-nightly/352/] we observed a and crash during the following test:
            {{lein trampoline run test --nodes-file ./nodes --username vagrant --ssh-private-key ./resources/vagrantkey --workload=kill --scenario=hard-reboot --replicas=2 --no-autofailover --durability=0:0:0:100 --eviction-policy=full --cas}}
            Crash Message:
            {noformat}
            2019-12-08T01:36:51.777389+00:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620
            {noformat}

            {code:cpp|title=ep_bucket.cc:620}
                            if (hcs) {
                                Expects(hcs > vbstate.persistedCompletedSeqno);
                                vbstate.persistedCompletedSeqno = *hcs;
                            }
            {code}
            During [kv-engine-jepsen-nightly-352|http://cv.jenkins.couchbase.com/view/Commit%20Validation/view/Jepsen/job/kv-engine-jepsen-nightly/352/] we observed a and crash during the following test:
            {{lein trampoline run test --nodes-file ./nodes --username vagrant --ssh-private-key ./resources/vagrantkey --workload=kill --scenario=hard-reboot --replicas=2 --no-autofailover --durability=0:0:0:100 --eviction-policy=full --cas}}
            Crash Message:
            {noformat}
            2019-12-08T01:36:51.777389+00:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620
            {noformat}

            {code:cpp|title=ep_bucket.cc:620}
                            if (hcs) {
                                Expects(hcs > vbstate.persistedCompletedSeqno);
                                vbstate.persistedCompletedSeqno = *hcs;
                            }
            {code}
            Couchbase version 6.5.1-6001
            richard.demellow Richard deMellow made changes -
            Summary [Jepsen] Crashes during hard-reboot scenario [Jepsen] Couchbase crash during hard-reboot scenario due to HCS >
            richard.demellow Richard deMellow made changes -
            Summary [Jepsen] Couchbase crash during hard-reboot scenario due to HCS > [Jepsen] Couchbase crash during hard-reboot scenario due to HCS > persistedCompletedSeqno
            drigby Dave Rigby made changes -
            Assignee Daniel Owen [ owend ] Dave Rigby [ drigby ]
            owend Daniel Owen made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            owend Daniel Owen made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            Fix Version/s 6.5.0 [ 16624 ]
            owend Daniel Owen made changes -
            Affects Version/s 6.5.1 [ 16622 ]
            owend Daniel Owen made changes -
            Fix Version/s 6.5.1 [ 16622 ]
            drigby Dave Rigby made changes -
            Affects Version/s 6.5.1 [ 16622 ]
            drigby Dave Rigby made changes -
            Fix Version/s 6.5.1 [ 16622 ]
            Fix Version/s 6.5.0 [ 16624 ]
            owend Daniel Owen made changes -
            Sprint KV Sprint 2019-12 [ 939 ]
            owend Daniel Owen made changes -
            Rank Ranked higher
            drigby Dave Rigby added a comment - - edited

            The earilist minidump in the logs is 29e7f66c-7f2c-8814-34d3a1da-481d982d.dmp on node .174, from:

            2019-12-08T02:18:05.289332+00:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.1-6001). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/29e7f66c-7f2c-8814-34d3a1da-481d982d.dmp before terminating.
            

            Backtrace of crashing thread:

            (gdb) bt   
            #0  0x00007f2d6f4cf428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
            #1  0x00007f2d6f4d102a in __GI_abort () at abort.c:89
            #2  0x00007f2d70030ced in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc:50
            #3  0x0000000000545f12 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:86
            #4  0x00007f2d7002eb16 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
            #5  0x00007f2d7002eb61 in std::terminate () at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:57
            #6  0x00007f2d7002eda3 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f2d1c000b40, tinfo=tinfo@entry=0x7f2d6af9d010 <typeinfo for gsl::fail_fast>, dest=dest@entry=0x7f2d6aadebf0 <gsl::fail_fast::~fail_fast()>)
                at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc:93
            #7  0x00007f2d6ab57dd8 in gsl::fail_fast_assert (cond=<optimized out>, message=0x7f2d6acfa710 "GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620")
                at /home/couchbase/jenkins/workspace/couchbase-server-unix/third_party/gsl-lite/include/gsl/gsl-lite.h:473
            #8  0x00007f2d6ab6b39f in EPBucket::flushVBucket (this=0x7f2d6bdef300, vbid=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:620
            #9  0x00007f2d6abbc6e0 in Flusher::flushVB (this=this@entry=0x7f2d6bf11000) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:306
            #10 0x00007f2d6abbd5b1 in Flusher::step (this=0x7f2d6bf11000, task=0x7f2d6ef6d970) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:215
            #11 0x00007f2d6abb7154 in ExecutorThread::run (this=0x7f2d6be547a0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:190
            #12 0x00007f2d71e4ce27 in CouchbaseThread::run (this=0x7f2d3fd123f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
            #13 platform_thread_wrap (arg=0x7f2d3fd123f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
            #14 0x00007f2d6f86b6ba in start_thread (arg=0x7f2d3cdfa700) at pthread_create.c:333
            #15 0x00007f2d6f5a141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
            

            Frame #8 has most of the interesting state - we crashed when attempting to flush vb:134 (replica). hcs is not available (optimized out), but vbstate.persistedCompletedSeqno is available and is equal to 9; so hcs must have been set and had value 9 or lower:

            #8  0x00007f2d6ab6b39f in EPBucket::flushVBucket (this=0x7f2d6bdef300, vbid=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:620
                    rlh = {lock_ = 0x7f2d3fea62a0, token_ = {type_ = folly::SharedMutexToken::Type::INLINE_SHARED, slot_ = 0}}
                    options = VBStatePersist::VBSTATE_CACHE_UPDATE_ONLY
                    prev = <optimized out>
                    maxSeqno = 36
                    maxVbStateOpCas = <optimized out>
                    minSeqno = 35
                    mustCheckpointVBState = <optimized out>
                    collectionFlush = ...
                    flush_end = <optimized out>
                    persistedVbState = <optimized out>
                    hcs = <optimized out>
                    hps = <optimized out>
                    maxVisibleSeqno = <optimized out>
                    trans_time = <optimized out>
                    toFlush = {items = ...,
                               ranges = ...,
                               moreAvailable = false,
                               maxDeletedRevSeqno = {<boost::optional_detail::tc_optional_base<unsigned long>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, 
                          m_initialized = false, m_storage = 0}, <No data fields>},
                               checkpointType = CheckpointType::Disk}
                    range = {<boost::optional_detail::optional_base<snapshot_range_t>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = true, m_storage = {dummy_ = {data = "$\000\000\000\000\000\000\000$\000\000\000\000\000\000", 
                            aligner_ = {<No data fields>}}}}, <No data fields>}
                    rwUnderlying = 0x7f2d6ef46800
                    vbstate = {static CurrentVersion = 3,
                               state = vbucket_state_dead,
                               maxDeletedSeqno = {counter = {_M_elems = "\000\000\000\000\000"}},
                               highSeqno = 10,
                               purgeSeqno = 0,
                               lastSnapStart = 36,
                               lastSnapEnd = 36,
                               maxCas = 1575771481264816128,
                               hlcCasEpochSeqno = 1,
                               mightContainXattrs = false,
                               supportsNamespaces = true,
                               version = 3,
                               persistedCompletedSeqno = 9,
                               persistedPreparedSeqno = 9,
                               highPreparedSeqno = 36,
                               maxVisibleSeqno = 10,
                               onDiskPrepares = 1,
                               checkpointType = CheckpointType::Disk,
                               transition = {failovers = {...},
                                             replicationTopology = {...},
                                             state = vbucket_state_replica}
                               }
                    vbstateRollback = {static CurrentVersion = 3,
                                       state = vbucket_state_dead,
                                       maxDeletedSeqno = {counter = {_M_elems = "\000\000\000\000\000"}},
                                       highSeqno = 10,
                                       lastSnapStart = 10,
                                       lastSnapEnd = 10,
                                       maxCas = 1575771443439534080, 
                                       hlcCasEpochSeqno = 1,
                                       mightContainXattrs = false,
                                       supportsNamespaces = true,
                                       version = 3,
                                       persistedCompletedSeqno = 9,
                                       persistedPreparedSeqno = 9,
                                       highPreparedSeqno = 9,
                                       maxVisibleSeqno = 10,
                                       onDiskPrepares = 1, 
                                       checkpointType = CheckpointType::Memory,
                                       transition = {failovers = ...,
                                                     replicationTopology = {...},
                                                     state = vbucket_state_replica}}
                    items = ...
                    items_flushed = <optimized out>
                    moreAvailable = false
                    flush_start = {__d = {__r = 28865831823}}
                    vb = ...
            

            From the logs, the history of vb:134 is as follows:

            1. Clean shutdown of bucket (from ns_server).
            2. On restart, created as a replica during warmup, no prepares loaded from disk; then it is populated from node .15 then throws an exception during flushing:

            2019-12-08T02:18:03.505480+00:00 INFO (default) VBucket: created vb:134 with state:replica initialState:replica lastSeqno:10 persistedRange:{10,10} max_cas:1575771443439534080 uuid:27948380356516 topology:null
            2019-12-08T02:18:03.790250+00:00 INFO (default) EPVBucket::loadOutstandingPrepares: (vb:134) created DM with PCS:9, PPS:9, HPS:9, number of prepares loaded:0
            2019-12-08T02:18:03.810857+00:00 DEBUG (default) Mapped new vb:134 in state replica
            2019-12-08T02:18:03.899025+00:00 DEBUG (default) CouchKVStore::snapshotVBucket: Snapshotted vb:134 state:{"checkpoint_type":"Memory","completed_seqno":"9","failover_table":[{"id":27948380356516,"seq":0}],"high_prepared_seqno":"9","high_seqno":"10","hlc_epoch":"1","max_cas":"1575771443439534080","max_deleted_seqno":"0","max_visible_seqno":"10","might_contain_xattrs":false,"namespaces_supported":true,"on_disk_prepares":"1","prepared_seqno":"9","purge_seqno":"0","snap_end":"10","snap_start":"10","state":"replica","version":3}
            2019-12-08T02:18:04.775912+00:00 INFO 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - (vb:134) Attempting to add stream: opaque_:52, start_seqno_:10, end_seqno_:18446744073709551615, vb_uuid:27948380356516, snap_start_seqno_:10, snap_end_seqno_:10, last_seqno:10, stream_req_value:{"uid":"0"}
            2019-12-08T02:18:04.963678+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - (vb:134) Add stream for opaque 52 with error code succeeded
            2019-12-08T02:18:04.963706+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - PassiveStream::transitionState: (vb:134) Transitioning from pending to awaiting-first-snapshot-marker
            2019-12-08T02:18:04.963944+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - PassiveStream::transitionState: (vb:134) Transitioning from awaiting-first-snapshot-marker to reading
             
            # Expects() in flushVBucket() fails 2s later:
            2019-12-08T02:18:05.071273+00:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620
            

            drigby Dave Rigby added a comment - - edited The earilist minidump in the logs is 29e7f66c-7f2c-8814-34d3a1da-481d982d.dmp on node .174, from: 2019-12-08T02:18:05.289332+00:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.1-6001). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/29e7f66c-7f2c-8814-34d3a1da-481d982d.dmp before terminating. Backtrace of crashing thread: (gdb) bt #0 0x00007f2d6f4cf428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f2d6f4d102a in __GI_abort () at abort.c:89 #2 0x00007f2d70030ced in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc:50 #3 0x0000000000545f12 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:86 #4 0x00007f2d7002eb16 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:47 #5 0x00007f2d7002eb61 in std::terminate () at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:57 #6 0x00007f2d7002eda3 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f2d1c000b40, tinfo=tinfo@entry=0x7f2d6af9d010 <typeinfo for gsl::fail_fast>, dest=dest@entry=0x7f2d6aadebf0 <gsl::fail_fast::~fail_fast()>) at /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc:93 #7 0x00007f2d6ab57dd8 in gsl::fail_fast_assert (cond=<optimized out>, message=0x7f2d6acfa710 "GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620") at /home/couchbase/jenkins/workspace/couchbase-server-unix/third_party/gsl-lite/include/gsl/gsl-lite.h:473 #8 0x00007f2d6ab6b39f in EPBucket::flushVBucket (this=0x7f2d6bdef300, vbid=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:620 #9 0x00007f2d6abbc6e0 in Flusher::flushVB (this=this@entry=0x7f2d6bf11000) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:306 #10 0x00007f2d6abbd5b1 in Flusher::step (this=0x7f2d6bf11000, task=0x7f2d6ef6d970) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:215 #11 0x00007f2d6abb7154 in ExecutorThread::run (this=0x7f2d6be547a0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:190 #12 0x00007f2d71e4ce27 in CouchbaseThread::run (this=0x7f2d3fd123f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58 #13 platform_thread_wrap (arg=0x7f2d3fd123f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71 #14 0x00007f2d6f86b6ba in start_thread (arg=0x7f2d3cdfa700) at pthread_create.c:333 #15 0x00007f2d6f5a141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Frame #8 has most of the interesting state - we crashed when attempting to flush vb:134 (replica). hcs is not available (optimized out), but vbstate.persistedCompletedSeqno is available and is equal to 9; so hcs must have been set and had value 9 or lower: #8 0x00007f2d6ab6b39f in EPBucket::flushVBucket (this=0x7f2d6bdef300, vbid=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:620 rlh = {lock_ = 0x7f2d3fea62a0, token_ = {type_ = folly::SharedMutexToken::Type::INLINE_SHARED, slot_ = 0}} options = VBStatePersist::VBSTATE_CACHE_UPDATE_ONLY prev = <optimized out> maxSeqno = 36 maxVbStateOpCas = <optimized out> minSeqno = 35 mustCheckpointVBState = <optimized out> collectionFlush = ... flush_end = <optimized out> persistedVbState = <optimized out> hcs = <optimized out> hps = <optimized out> maxVisibleSeqno = <optimized out> trans_time = <optimized out> toFlush = {items = ..., ranges = ..., moreAvailable = false, maxDeletedRevSeqno = {<boost::optional_detail::tc_optional_base<unsigned long>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = false, m_storage = 0}, <No data fields>}, checkpointType = CheckpointType::Disk} range = {<boost::optional_detail::optional_base<snapshot_range_t>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = true, m_storage = {dummy_ = {data = "$\000\000\000\000\000\000\000$\000\000\000\000\000\000", aligner_ = {<No data fields>}}}}, <No data fields>} rwUnderlying = 0x7f2d6ef46800 vbstate = {static CurrentVersion = 3, state = vbucket_state_dead, maxDeletedSeqno = {counter = {_M_elems = "\000\000\000\000\000"}}, highSeqno = 10, purgeSeqno = 0, lastSnapStart = 36, lastSnapEnd = 36, maxCas = 1575771481264816128, hlcCasEpochSeqno = 1, mightContainXattrs = false, supportsNamespaces = true, version = 3, persistedCompletedSeqno = 9, persistedPreparedSeqno = 9, highPreparedSeqno = 36, maxVisibleSeqno = 10, onDiskPrepares = 1, checkpointType = CheckpointType::Disk, transition = {failovers = {...}, replicationTopology = {...}, state = vbucket_state_replica} } vbstateRollback = {static CurrentVersion = 3, state = vbucket_state_dead, maxDeletedSeqno = {counter = {_M_elems = "\000\000\000\000\000"}}, highSeqno = 10, lastSnapStart = 10, lastSnapEnd = 10, maxCas = 1575771443439534080, hlcCasEpochSeqno = 1, mightContainXattrs = false, supportsNamespaces = true, version = 3, persistedCompletedSeqno = 9, persistedPreparedSeqno = 9, highPreparedSeqno = 9, maxVisibleSeqno = 10, onDiskPrepares = 1, checkpointType = CheckpointType::Memory, transition = {failovers = ..., replicationTopology = {...}, state = vbucket_state_replica}} items = ... items_flushed = <optimized out> moreAvailable = false flush_start = {__d = {__r = 28865831823}} vb = ... From the logs, the history of vb:134 is as follows: Clean shutdown of bucket (from ns_server). On restart, created as a replica during warmup, no prepares loaded from disk; then it is populated from node .15 then throws an exception during flushing: 2019-12-08T02:18:03.505480+00:00 INFO (default) VBucket: created vb:134 with state:replica initialState:replica lastSeqno:10 persistedRange:{10,10} max_cas:1575771443439534080 uuid:27948380356516 topology:null 2019-12-08T02:18:03.790250+00:00 INFO (default) EPVBucket::loadOutstandingPrepares: (vb:134) created DM with PCS:9, PPS:9, HPS:9, number of prepares loaded:0 2019-12-08T02:18:03.810857+00:00 DEBUG (default) Mapped new vb:134 in state replica 2019-12-08T02:18:03.899025+00:00 DEBUG (default) CouchKVStore::snapshotVBucket: Snapshotted vb:134 state:{"checkpoint_type":"Memory","completed_seqno":"9","failover_table":[{"id":27948380356516,"seq":0}],"high_prepared_seqno":"9","high_seqno":"10","hlc_epoch":"1","max_cas":"1575771443439534080","max_deleted_seqno":"0","max_visible_seqno":"10","might_contain_xattrs":false,"namespaces_supported":true,"on_disk_prepares":"1","prepared_seqno":"9","purge_seqno":"0","snap_end":"10","snap_start":"10","state":"replica","version":3} 2019-12-08T02:18:04.775912+00:00 INFO 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - (vb:134) Attempting to add stream: opaque_:52, start_seqno_:10, end_seqno_:18446744073709551615, vb_uuid:27948380356516, snap_start_seqno_:10, snap_end_seqno_:10, last_seqno:10, stream_req_value:{"uid":"0"} 2019-12-08T02:18:04.963678+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - (vb:134) Add stream for opaque 52 with error code succeeded 2019-12-08T02:18:04.963706+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - PassiveStream::transitionState: (vb:134) Transitioning from pending to awaiting-first-snapshot-marker 2019-12-08T02:18:04.963944+00:00 DEBUG 64: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.28.128.15->ns_1@172.28.128.174:default - PassiveStream::transitionState: (vb:134) Transitioning from awaiting-first-snapshot-marker to reading   # Expects() in flushVBucket() fails 2s later: 2019-12-08T02:18:05.071273+00:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc: 620
            drigby Dave Rigby added a comment -

            Problem identified - issue is with fix for MB-37161 (http://review.couchbase.org/#/c/119024/). The assumption that the DCP Consumer would have fully negotiated SyncReplication when the PassiveStream object is created is invalid. We must instead wait for DCP Control negotiation to complete.

            Fix should be relatively straightforward- should have fix ready by end of tomorrow (Tuesday 10th).

            drigby Dave Rigby added a comment - Problem identified - issue is with fix for MB-37161 ( http://review.couchbase.org/#/c/119024/ ). The assumption that the DCP Consumer would have fully negotiated SyncReplication when the PassiveStream object is created is invalid. We must instead wait for DCP Control negotiation to complete. Fix should be relatively straightforward- should have fix ready by end of tomorrow (Tuesday 10th).
            drigby Dave Rigby made changes -
            Due Date 10/Dec/19
            drigby Dave Rigby made changes -
            Link This issue is caused by MB-37161 [ MB-37161 ]
            drigby Dave Rigby added a comment -

            This issue is caused by an incomplete fix for MB-37161 - the initial fix for MB-37161 incorrectly assumed DCP features were negotiated before PassiveStream objects would be created.

            Re-opening MB-37161 and amending the fix there.

            drigby Dave Rigby added a comment - This issue is caused by an incomplete fix for MB-37161 - the initial fix for MB-37161 incorrectly assumed DCP features were negotiated before PassiveStream objects would be created. Re-opening MB-37161 and amending the fix there.
            drigby Dave Rigby made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            drigby Dave Rigby made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            drigby Dave Rigby added a comment -

            Duplicate of MB-37161.

            drigby Dave Rigby added a comment - Duplicate of MB-37161 .
            drigby Dave Rigby made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            drigby Dave Rigby made changes -
            Link This issue duplicates MB-37161 [ MB-37161 ]
            drigby Dave Rigby made changes -
            Link This issue duplicates MB-37161 [ MB-37161 ]

            Bulk closing invalid, duplicate and won't fix bugs

            raju Raju Suravarjjala added a comment - Bulk closing invalid, duplicate and won't fix bugs
            raju Raju Suravarjjala made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              drigby Dave Rigby
              richard.demellow Richard deMellow
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty