Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48450

[Investigate] Investigate why KVEngine crashes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • 7.1.0
    • couchbase-bucket
    • Build: 7.1.0-1277
    • 1

    Description

      Description:

      Core dumps were found at the end of two failed tests, investigate why KVEngine crashed. The logs for these have been collected here:

      1. Investigate MB-48452
      2. InvestigateĀ MB-48453

      Commentary:

      This may be a duplicate of MB-48384.

      Steps to reproduce:

      The 2 tests, in which KV Engine crashes on, consist of 3 KV (only) nodes with the magma storage backend.

      From the names of the tests it appears as if the tests load data to collections while performing various cluster operations.

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.38389.ini GROUP=P0_failover_and_recovery_dgm,rerun=False,get-cbcollect-info=True,infra_log_level=critical,log_level=error,bucket_storage=magma,enable_dp=True,upgrade_version=7.1.0-1277 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=3,nodes_failover=1,recovery_type=full,bucket_spec=dgm.buckets_for_rebalance_tests,data_load_stage=during,dgm=40,skip_validations=False,GROUP=P0_failover_and_recovery_dgm'
      

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.38389.ini GROUP=P0_failover_and_recovery_dgm,rerun=False,get-cbcollect-info=True,infra_log_level=critical,log_level=error,bucket_storage=magma,enable_dp=True,upgrade_version=7.1.0-1277 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=3,nodes_failover=1,recovery_type=delta,bucket_spec=dgm.buckets_for_rebalance_tests,data_load_stage=during,dgm=40,skip_validations=False,GROUP=P0_failover_and_recovery_dgm'
      

      What's the problem?

      The logs show that KV Engine crashes and produces a mini dump in both cases in the middle of compaction (AFAIK).

      Appendix:

      Here are some extracts from test.log contains the analysis of 2 minidumps:

      test.log

      running: //opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/f344d4be-47fa-4521-4c835ca8-226ee566.dmp > /opt/couchbase/var/lib/couchbase/crash/f344d4be-47fa-4521-4c835ca8-226ee566.core
      running: gdb --batch /opt/couchbase/bin/memcached -c /opt/couchbase/var/lib/couchbase/crash/f344d4be-47fa-4521-4c835ca8-226ee566.core -ex "bt full" -ex quit
      172.23.123.119: Stack Trace of first crash - f344d4be-47fa-4521-4c835ca8-226ee566.dmp
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007f7d3c685337 in raise () from /lib64/libc.so.6
       #0  0x00007f7d3c685337 in raise () from /lib64/libc.so.6
       No symbol table info available.
       #1  0x00007f7d3c686a28 in abort () from /lib64/libc.so.6
       No symbol table info available.
       #2  0x00007f7d3cfd063c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
               terminating = false
               t = <optimized out>
       #3  0x0000000000a99a3b in backtrace_terminate_handler() ()
       No symbol table info available.
       #4  0x00007f7d3cfdb8f6 in __cxxabiv1::__terminate(void (*)()) () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
       No locals.
       #5  0x00007f7d3cfdb961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
       No locals.
       #6  0x00007f7d3cfdbc46 in __cxxabiv1::__cxa_rethrow () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:133
               globals = <optimized out>
               header = <optimized out>
       #7  0x00000000004c3efe in EPBucket::compactionCompletionCallback(CompactionContext&) [clone .cold] ()
       No symbol table info available.
       #8  0x00000000008524f2 in MagmaKVStore::compactDBInternal(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
       No symbol table info available.
       #9  0x0000000000852f76 in MagmaKVStore::compactDB(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
       No symbol table info available.
       #10 0x00000000007eb3ba in EPBucket::compactInternal(LockedVBucketPtr&, CompactionConfig&) ()
       No symbol table info available.
       #11 0x00000000007ec981 in EPBucket::doCompact(Vbid, CompactionConfig&, std::vector<CookieIface const*, std::allocator<CookieIface const*> >&) ()
       No symbol table info available.
       #12 0x0000000000706666 in CompactTask::run() ()
       No symbol table info available.
       #13 0x0000000000a0ae52 in GlobalTask::execute() ()
       No symbol table info available.
       #14 0x0000000000a07f75 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const ()
       No symbol table info available.
       #15 0x0000000000b59b30 in folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) ()
       No symbol table info available.
       #16 0x0000000000b418ea in folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>) ()
       No symbol table info available.
       #17 0x0000000000b5cae9 in void folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) ()
       No symbol table info available.
       #18 0x0000000000a07c04 in void folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) ()
       No symbol table info available.
       #19 0x00007f7d3d004d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
       No locals.
       #20 0x00007f7d3ee24e65 in start_thread () from /lib64/libpthread.so.0
       No symbol table info available.
       #21 0x00007f7d3c74d88d in clone () from /lib64/libc.so.6
       No symbol table info available.
       
      ##############################
      running: gdb -p `(pidof memcached)` -ex "thread apply all bt" -ex detach -ex quit
      [Thread debugging using libthread_db enabled]
       Using host libthread_db library "/lib64/libthread_db.so.1".
      ...
      

      test.log

      172.23.121.222: 1 core dump seen
      running: //opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/53b2e9b9-7a98-45a5-ad5431aa-ef85228a.dmp > /opt/couchbase/var/lib/couchbase/crash/53b2e9b9-7a98-45a5-ad5431aa-ef85228a.core
      running: gdb --batch /opt/couchbase/bin/memcached -c /opt/couchbase/var/lib/couchbase/crash/53b2e9b9-7a98-45a5-ad5431aa-ef85228a.core -ex "bt full" -ex quit
      172.23.121.222: Stack Trace of first crash - 53b2e9b9-7a98-45a5-ad5431aa-ef85228a.dmp
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007fa8df4f9337 in raise () from /lib64/libc.so.6
       #0  0x00007fa8df4f9337 in raise () from /lib64/libc.so.6
       No symbol table info available.
       #1  0x00007fa8df4faa28 in abort () from /lib64/libc.so.6
       No symbol table info available.
       #2  0x00007fa8dfe4463c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
               terminating = false
               t = <optimized out>
       #3  0x0000000000a99a3b in backtrace_terminate_handler() ()
       No symbol table info available.
       #4  0x00007fa8dfe4f8f6 in __cxxabiv1::__terminate(void (*)()) () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
       No locals.
       #5  0x00007fa8dfe4f961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
       No locals.
       #6  0x00007fa8dfe4fc46 in __cxxabiv1::__cxa_rethrow () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:133
               globals = <optimized out>
               header = <optimized out>
       #7  0x00000000004c3efe in EPBucket::compactionCompletionCallback(CompactionContext&) [clone .cold] ()
       No symbol table info available.
       #8  0x00000000008524f2 in MagmaKVStore::compactDBInternal(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
       No symbol table info available.
       #9  0x0000000000852f76 in MagmaKVStore::compactDB(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
       No symbol table info available.
       #10 0x00000000007eb3ba in EPBucket::compactInternal(LockedVBucketPtr&, CompactionConfig&) ()
       No symbol table info available.
       #11 0x00000000007ec981 in EPBucket::doCompact(Vbid, CompactionConfig&, std::vector<CookieIface const*, std::allocator<CookieIface const*> >&) ()
       No symbol table info available.
       #12 0x0000000000706666 in CompactTask::run() ()
       No symbol table info available.
       #13 0x0000000000a0ae52 in GlobalTask::execute() ()
       No symbol table info available.
       #14 0x0000000000a07f75 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const ()
       No symbol table info available.
       #15 0x0000000000b59b30 in folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) ()
       No symbol table info available.
       #16 0x0000000000b418ea in folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>) ()
       No symbol table info available.
       #17 0x0000000000b5cae9 in void folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) ()
       No symbol table info available.
       #18 0x0000000000a07c04 in void folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) ()
       No symbol table info available.
       #19 0x00007fa8dfe78d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
       No locals.
       #20 0x00007fa8e1c98e65 in start_thread () from /lib64/libpthread.so.0
       No symbol table info available.
       #21 0x00007fa8df5c188d in clone () from /lib64/libc.so.6
       No symbol table info available.
       
      ##############################
      running: gdb -p `(pidof memcached)` -ex "thread apply all bt" -ex detach -ex quit
      [Thread debugging using libthread_db enabled]
       Using host libthread_db library "/lib64/libthread_db.so.1".
      ...
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              asad.zaidi Asad Zaidi (Inactive)
              asad.zaidi Asad Zaidi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty