Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39532

[Collections] - Hard failover + recovery + rebalance while collections CRUD+ durability causes memcached crashes

    XMLWordPrintable

Details

    Description

      Script to Repro

      ./testrunner -i /tmp/testexec.96490.ini GROUP=failover_with_collection_crud_durability,rerun=False -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=2,recovery_type=delta,override_spec_params=durability;replicas,durability=MAJORITY,step_count=1,replicas=Bucket.ReplicaNum.TWO,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=100,GROUP=failover_with_collection_crud_durability
      

      I see close to 250+ mini dumps in this particular test.

      From memcached.log

      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.141415-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.141467-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): VB::Manifest:addNewCollectionEntry: collection already exists + , collection:0x0, scope:0x0, startSeqno:0, this:VB::Manifest: uid:255, defaultCollectionExists:1, scopes.size:9, map.size:109
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.154037-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300307-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2126). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/64b7c648-cf40-48ac-9ce2ae8d-ce13b618.dmp before terminating.
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300321-07:00 CRITICAL Stack backtrace of crashed thread:
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300487-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x1397ad]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300498-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x14f4fa]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300507-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x14f838]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300514-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fa3c7cd0000+0xf5f0]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300536-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7fa3c7902000+0x36337]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300554-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7fa3c7902000+0x37a28]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300581-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7fa3c8405000+0x91195]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300590-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x14a9f2]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300605-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa3c8405000+0x8ef86]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300620-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa3c8405000+0x8efd1]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300633-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa3c8405000+0x8f213]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300643-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x2544a9]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300650-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x25008a]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300657-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x2507fd]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300666-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x1c7ade]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300672-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x1d2a80]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300679-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x145a93]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300684-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fa3cbb46000+0x13e33f]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300690-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fa3ca68a000+0x10827]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300696-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fa3c7cd0000+0x7e65]
      cbcollect_info_ns_1@172.23.122.131_20200522-070212/memcached.log:2020-05-22T00:02:52.300725-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fa3c7902000+0xfe88d]
      

      Backtrace

      (gdb) bt full
      #0  0x00007fe525fa8337 in __gconv_transform_internal_ucs2 () from /lib64/libc.so.6
      No symbol table info available.
      #1  0x00007fe514b74e10 in ?? ()
      No symbol table info available.
      #2  0x00007fe525fedf23 in getwchar_unlocked () from /lib64/libc.so.6
      No symbol table info available.
      #3  0x00007fe52633a1c0 in _IO_2_1_stderr_ () from /lib64/libc.so.6
      No symbol table info available.
      #4  0x00007fe526b03180 in __cxa_get_globals_fast () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #5  0x00007fe52b1ec500 in ?? ()
      No symbol table info available.
      #6  0x00007fe514b75a70 in ?? ()
      No symbol table info available.
      #7  0x00007fe526b06195 in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc:95
              terminating = false
              t = <optimized out>
      #8  0x000000000054a9f2 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:86
      No locals.
      #9  0x00007fe526b03f86 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
      No locals.
      #10 0x00007fe526b03fd1 in std::terminate () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:57
      No locals.
      #11 0x00007fe526b04213 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fe4e0061bd0, tinfo=0x7fb910 <typeinfo for std::logic_error>, dest=0x418760 <_ZNSt11logic_errorD1Ev@plt>) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc:93
              globals = <optimized out>
              header = 0x7fe4e0061b50
      #12 0x00007fe52a40a4a9 in Collections::VB::Manifest::throwException<std::logic_error> (this=this@entry=0x7fe4e585a840, thrower=..., error=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/collections/vbucket_manifest.h:1289
      No locals.
      #13 0x00007fe52a40608a in Collections::VB::Manifest::addNewCollectionEntry (this=this@entry=0x7fe4e585a840, identifiers=..., maxTtl=..., startSeqno=<optimized out>)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/collections/vbucket_manifest.cc:284
              itr = <optimized out>
              __FUNCTION__ = "addNewCollectionEntry"
              inserted = <optimized out>
      #14 0x00007fe52a4067fd in Collections::VB::Manifest::Manifest (this=0x7fe4e585a840, data=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/collections/vbucket_manifest.cc:49
              meta = @0x7fe515d40bc8: <error reading variable>
              __for_range = <optimized out>
      #15 0x00007fe52a37dade in make_unique<Collections::VB::Manifest, Collections::KVStore::Manifest> () at /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825
      No locals.
      #16 Warmup::createVBuckets (this=<optimized out>, shardId=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/warmup.cc:1035
              bucket = <optimized out>
              table = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, FailoverTable*, std::default_delete<FailoverTable> >> = {<std::_Tuple_impl<1, std::default_delete<FailoverTable> >> = {<std::_Head_base<1, std::default_delete<FailoverTable>, true>> = {<std::default_delete<FailoverTable>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, FailoverTable*, false>> = {_M_head_impl = 0x7fe515d39d00}, <No data fields>}, <No data fields>}}}
              shard = 0x7fe51c05a570
              manifest = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, Collections::VB::Manifest*, std::default_delete<Collections::VB::Manifest> >> = {<std::_Tuple_impl<1, std::default_delete<Collections::VB::Manifest> >> = {<std::_Head_base<1, std::default_delete<Collections::VB::Manifest>, true>> = {<std::default_delete<Collections::VB::Manifest>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, Collections::VB::Manifest*, false>> = {
                        _M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}
              vbid = {vbid = 252}
              vbs = @0x7fe4e5681448: <error reading variable>
       
              vb = {<std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<VBucket, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}
              maxEntries = <optimized out>
      #17 0x00007fe52a388a80 in WarmupCreateVBuckets::run (this=0x7fe515d1d350) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/warmup.cc:180
              phosphor_internal_category_enabled_179 = {_M_b = {_M_p = 0x0}, static is_always_lock_free = <error reading variable: No global symbol "std::atomic<std::atomic<phosphor::CategoryStatus> const*>::is_always_lock_free".>}
              phosphor_internal_category_enabled_temp_179 = <optimized out>
              phosphor_internal_tpi_179 = {category = 0x2926b3 <Address 0x2926b3 out of bounds>, name = 0x2aa54b <Address 0x2aa54b out of bounds>, type = phosphor::Complete, argument_names = {_M_elems = {
                    0x2bf1fb <Address 0x2bf1fb out of bounds>, 0x2bf1fb <Address 0x2bf1fb out of bounds>}}, argument_types = {_M_elems = {phosphor::is_none, phosphor::is_none}}}
              phosphor_internal_guard_179 = {tpi = 0x7fe52a707080 <WarmupCreateVBuckets::run()::phosphor_internal_tpi_179>, enabled = true, arg1 = {<No data fields>}, arg2 = {<No data fields>}, start = {__d = {__r = 430958911478490}}}
      #18 0x00007fe52a2fba93 in GlobalTask::execute (this=0x7fe515d1d350) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
      ---Type <return> to continue, or q <return> to quit---
              guard = {previous = 0x0}
      #19 0x00007fe52a2f433f in ExecutorThread::run (this=0x7fe524ba7fc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:188
              curTaskDescr = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                  _M_p = 0x7fe524aee5b0 <Address 0x7fe524aee5b0 out of bounds>}, _M_string_length = 35, {_M_local_buf = "#\000\000\000\000\000\000\000n\253\001\000\000\000\000", _M_allocated_capacity = 35}}
              woketime = <optimized out>
              scheduleOverhead = <optimized out>
              again = <optimized out>
              runtime = <optimized out>
              q = <optimized out>
              tick = 4 '\004'
              guard = {engine = 0x0}
      #20 0x00007fe528d0a827 in run (this=0x7fe524aed2f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
      No locals.
      #21 platform_thread_wrap (arg=0x7fe524aed2f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
              context = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7fe524aed2f0}, <No data fields>}, <No data fields>}}}
      #22 0x00007fe526347e65 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #23 0x00007fe52607088d in ?? () from /lib64/libc.so.6
      No symbol table info available.
      #24 0x0000000000000000 in ?? ()
      No symbol table info available.
      (gdb) 
      

      cbcollect_info attached.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-39532
          # Subject Branch Project Status CR V

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty