Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48441

memcached crashed, rebalance in failed. KVStore::CompactionContext::maybeUpdatePurgeSeqno(): Unable to get vbucket ptr for vb:763

    XMLWordPrintable

Details

    Description

      Steps:

      1. Create 5 kv, 1n1ql/index node cluster, a magma bucket, 50 collections and load 250M items and upsert them all to generate 50% fragmentation.
      2. Start CRUD load 25% each and do a rebalance in. Rebalance was extremely slow and failed after after 9 hours it reached 12% only and then memcached crashed.

      172.23.106.236

      1 core dump seen
      2021-09-13 07:00:44,769 | infra | DEBUG   | MainThread | [remote_util:execute_command_raw_jsch:3297] Running command on 172.23.106.236: rm -rf /opt/couchbase/var/lib/couchbase/crash/02b0af2-5cee-478c-e6400fbf-9789de4.core
      running: //opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/d02b0af2-5cee-478c-e6400fbf-9789de4d.dmp > /opt/couchbase/var/lib/couchbase/crash/02b0af2-5cee-478c-e6400fbf-9789de4.core
      2021-09-13 07:00:44,785 | infra | DEBUG   | MainThread | [remote_util:execute_command_raw_jsch:3297] Running command on 172.23.106.236: //opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/d02b0af2-5cee-478c-e6400fbf-9789de4d.dmp > /opt/couchbase/var/lib/couchbase/crash/02b0af2-5cee-478c-e6400fbf-9789de4.core
      running: gdb --batch /opt/couchbase/bin/memcached -c /opt/couchbase/var/lib/couchbase/crash/02b0af2-5cee-478c-e6400fbf-9789de4.core -ex "bt full" -ex quit
      2021-09-13 07:00:44,813 | infra | DEBUG   | MainThread | [remote_util:execute_command_raw_jsch:3297] Running command on 172.23.106.236: gdb --batch /opt/couchbase/bin/memcached -c /opt/couchbase/var/lib/couchbase/crash/02b0af2-5cee-478c-e6400fbf-9789de4.core -ex "bt full" -ex quit
      172.23.106.236: Stack Trace of first crash - d02b0af2-5cee-478c-e6400fbf-9789de4d.dmp
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007f49b8a50337 in raise () from /lib64/libc.so.6
       #0  0x00007f49b8a50337 in raise () from /lib64/libc.so.6
       No symbol table info available.
       #1  0x00007f49b8a51a28 in abort () from /lib64/libc.so.6
       No symbol table info available.
       #2  0x00007f49b939b63c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
               terminating = false
               t = <optimized out>
       #3  0x0000000000ab75fb in backtrace_terminate_handler() ()
       No symbol table info available.
       #4  0x00007f49b93a68f6 in __cxxabiv1::__terminate(void (*)()) () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
       No locals.
       #5  0x00007f49b93a6961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
       No locals.
       #6  0x00007f49b93a6bf4 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0xfd9360 <typeinfo for std::runtime_error>, dest=0x443090 <_ZNSt13runtime_errorD1Ev@plt>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
               globals = <optimized out>
               header = 0x7f43580008c0
       #7  0x00000000004c4a2d in std::_Function_handler<void (unsigned long), EPBucket::makeCompactionContext(Vbid, CompactionConfig&, unsigned long)::{lambda(unsigned long)#3}>::_M_invoke(std::_Any_data const&, unsigned long&&) [clone .cold] ()
       No symbol table info available.
       #8  0x000000000086278d in MagmaKVStore::compactionCallBack(MagmaKVStore::MagmaCompactionCB&, magma::Slice const&, magma::Slice const&, magma::Slice const&) ()
       No symbol table info available.
       #9  0x00000000009138cd in magma::DocSeqGCContext::Transform(magma::Slice const&, magma::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, magma::Slice*) ()
       No symbol table info available.
       #10 0x00000000009bee05 in magma::ItemGCFilter::Process(std::vector<std::pair<magma::Slice, magma::Slice>, std::allocator<std::pair<magma::Slice, magma::Slice> > >&, std::vector<std::pair<magma::Slice, magma::Slice>, std::allocator<std::pair<magma::Slice, magma::Slice> > >*) ()
       No symbol table info available.
       #11 0x00000000009962d2 in magma::LSMTree::mergeSSTables(std::vector<std::shared_ptr<magma::Table>, std::allocator<std::shared_ptr<magma::Table> > >&, magma::ItemGCFilter&, unsigned long, double, std::vector<std::shared_ptr<magma::Table>, std::allocator<std::shared_ptr<magma::Table> > >*) ()
       No symbol table info available.
       #12 0x000000000096c523 in magma::LSMTree::compactLevelForInputs(int, int, std::unique_lock<std::mutex>&, std::vector<std::shared_ptr<magma::Table>, std::allocator<std::shared_ptr<magma::Table> > >&, std::vector<std::shared_ptr<magma::Table>, std::allocator<std::shared_ptr<magma::Table> > >&, bool, std::function<std::unique_ptr<magma::GCContext, std::default_delete<magma::GCContext> > (bool)>) ()
       No symbol table info available.
       #13 0x000000000096ef7a in magma::LSMTree::compactLevel(int, std::unique_lock<std::mutex>&, bool, std::function<std::unique_ptr<magma::GCContext, std::default_delete<magma::GCContext> > (bool)>) ()
       No symbol table info available.
       #14 0x000000000096f20c in magma::LSMTree::compact(std::unique_lock<std::mutex>&) ()
       No symbol table info available.
       #15 0x0000000000989138 in std::_Function_handler<void (), magma::LSMTree::queueCompaction()::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
       No symbol table info available.
       #16 0x00000000009580ed in magma::TimedTask::Complete(bool) ()
       No symbol table info available.
       #17 0x00000000009598e1 in magma::TaskWorker::loop(void*) ()
       No symbol table info available.
       #18 0x0000000000b14c09 in platform_thread_wrap(void*) ()
       No symbol table info available.
       #19 0x00007f49bb1efe65 in start_thread () from /lib64/libpthread.so.0
       No symbol table info available.
       #20 0x00007f49b8b1888d in clone () from /lib64/libc.so.6
       No symbol table info available.
      

      172.23.106.236 --> babysitter.log

      ns_server:info,2021-09-13T07:00:29.265-07:00,babysitter_of_ns_1@cb.local:<0.129.0>:ns_port_server:log:221]memcached<0.129.0>: 2021-09-13T07:00:29.064060-07:00 CRITICAL *** Fatal error encountered during exception handling ***\n', 'memcached<0.129.0>: 2021-09-13T07:00:29.073005-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): KVStore::CompactionContext::maybeUpdatePurgeSeqno(): Unable to get vbucket ptr for vb:763\n', '[ns_server:info,2021-09-13T07:00:30.714-07:00,babysitter_of_ns_1@cb.local:<0.129.0>:ns_port_server:log:221]memcached<0.129.0>: CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1250). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/d02b0af2-5cee-478c-e6400fbf-9789de4d.dmp before terminating.\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570253-07:00 CRITICAL Detected previous crash\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570314-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1250). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/d02b0af2-5cee-478c-e6400fbf-9789de4d.dmp before terminating.\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570324-07:00 CRITICAL Stack backtrace of crashed thread:\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570326-07:00 CRITICAL    #0  /opt/couchbase/bin/memcached() [0x400000+0x6a6fd8]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570327-07:00 CRITICAL    #1  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x6ff52a]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570329-07:00 CRITICAL    #2  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x6ff868]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570330-07:00 CRITICAL    #3  /lib64/libpthread.so.0() [0x7f49bb1e8000+0xf5f0]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570332-07:00 CRITICAL    #4  /lib64/libc.so.6(gsignal+0x37) [0x7f49b8a1a000+0x36337]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570334-07:00 CRITICAL    #5  /lib64/libc.so.6(abort+0x148) [0x7f49b8a1a000+0x37a28]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570335-07:00 CRITICAL    #6  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f49b9302000+0x9963c]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570336-07:00 CRITICAL    #7  /opt/couchbase/bin/memcached() [0x400000+0x6b75fb]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570338-07:00 CRITICAL    #8  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f49b9302000+0xa48f6]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570339-07:00 CRITICAL    #9  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f49b9302000+0xa4961]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570340-07:00 CRITICAL    #10 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f49b9302000+0xa4bf4]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570361-07:00 CRITICAL    #11 /opt/couchbase/bin/memcached() [0x400000+0xc4a2d]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570363-07:00 CRITICAL    #12 /opt/couchbase/bin/memcached() [0x400000+0x46278d]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570364-07:00 CRITICAL    #13 /opt/couchbase/bin/memcached() [0x400000+0x5138cd]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570389-07:00 CRITICAL    #14 /opt/couchbase/bin/memcached() [0x400000+0x5bee05]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570391-07:00 CRITICAL    #15 /opt/couchbase/bin/memcached() [0x400000+0x5962d2]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570392-07:00 CRITICAL    #16 /opt/couchbase/bin/memcached() [0x400000+0x56c523]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570396-07:00 CRITICAL    #17 /opt/couchbase/bin/memcached() [0x400000+0x56ef7a]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570398-07:00 CRITICAL    #18 /opt/couchbase/bin/memcached() [0x400000+0x56f20c]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570399-07:00 CRITICAL    #19 /opt/couchbase/bin/memcached() [0x400000+0x589138]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570400-07:00 CRITICAL    #20 /opt/couchbase/bin/memcached() [0x400000+0x5580ed]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570439-07:00 CRITICAL    #21 /opt/couchbase/bin/memcached() [0x400000+0x5598e1]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570441-07:00 CRITICAL    #22 /opt/couchbase/bin/memcached() [0x400000+0x714c09]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570443-07:00 CRITICAL    #23 /lib64/libpthread.so.0() [0x7f49bb1e8000+0x7e65]\n', 'memcached<0.11355.12>: 2021-09-13T07:00:32.570446-07:00 CRITICAL    #24 /lib64/libc.so.6(clone+0x6d) [0x7f49b8a1a000+0xfe88d]\n'
      

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job1.ini bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.test_rebalance,nodes_init=4,graceful=True,skip_cleanup=True,num_items=2500000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=2,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=50,pc=25,index_nodes=2,cbas_nodes=0,fts_nodes=0,ops_rate=80000,ramQuota=17000,doc_ops=create:update:delete:read,rebl_ops_rate=10000,key_type=RandomKey'
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty