Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50519

[Magma] - Caught unhandled std::exception-derived exception. what(): ThrowExceptionUnderflowPolicy current:0 arg:-143

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes
    • KV 2022-Jan

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=True,GROUP=hard_failover_and_delta_recovery_P0_set0,nodes_init=4,doc_size=250,bucket_spec=magma_dgm.10_percent_dgm.5_node_1_replica_magma_512_single_bucket,nodes_failover=1 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=2,recovery_type=delta,bucket_spec=magma_dgm.5_percent_dgm.5_node_2_replica_magma_512,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,GROUP=hard_failover_and_delta_recovery_P0_set0'
      

      Steps to Repro
      1. Create a 4 node cluster

      2022-01-20 08:09:00,947 | test  | INFO    | MainThread | [table_view:display:72] Cluster statistics
      +----------------+----------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+
      | Node           | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used         | Active / Replica  | Version               |
      +----------------+----------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+
      | 172.23.104.186 | kv       | 1.35406218656   | 5.66 GiB  | 4.80 GiB | 486.05 MiB / 3.50 GiB | 0 / 0             | 7.1.0-2111-enterprise |
      | 172.23.120.201 | kv       | 2.08437970869   | 3.67 GiB  | 3.02 GiB | 2.50 MiB / 3.50 GiB   | 0 / 0             | 7.1.0-2111-enterprise |
      | 172.23.120.206 | kv       | 1.63193572684   | 3.67 GiB  | 2.99 GiB | 0.0 Byte / 0.0 Byte   | 0 / 0             | 7.1.0-2111-enterprise |
      | 172.23.121.10  | kv       | 2.79667422525   | 3.91 GiB  | 3.26 GiB | 47.63 MiB / 3.50 GiB  | 0 / 0             | 7.1.0-2111-enterprise |
      +----------------+----------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+
      

      2. Create bucket/scope/collections/data

      2022-01-20 08:50:16,970 | test  | INFO    | MainThread | [table_view:display:72] Bucket statistics
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+----------+-----------+---------------+
      | Bucket  | Type      | Storage Backend | Replicas | Durability | TTL | Items    | RAM Quota | RAM Used | Disk Used | ARR           |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+----------+-----------+---------------+
      | default | couchbase | magma           | 1        | none       | 0   | 24500000 | 2.00 GiB  | 1.34 GiB | 8.97 GiB  | 16.6066612245 |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+----------+-----------+---------------+
      

      3. Hard failover a node 172.23.104.186.

      2022-01-20 08:50:21,869 | test  | INFO    | MainThread | [collections_rebalance:rebalance_operation:724] failing over nodes [ip:172.23.104.186 port:8091 ssh_username:root]
      2022-01-20 08:50:26,697 | test  | INFO    | pool-3-thread-5 | [rest_client:monitorRebalance:1599] Rebalance done. Taken 3.02500009537 seconds to complete
      

      4. Do a delta recovery and start rebalance

      2022-01-20 13:50:17,312 | test  | WARNING | MainThread | [rest_client:get_nodes:1880] 172.23.104.186 - Node not part of cluster inactiveFailed
      2022-01-20 13:52:21,688 | test  | INFO    | pool-3-thread-27 | [table_view:display:72] Rebalance Overview
      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.104.186 | kv       | 7.1.0-2111-enterprise | 1.10414052698 | Cluster node |
      | 172.23.120.201 | kv       | 7.1.0-2111-enterprise | 4.24729831616 | Cluster node |
      | 172.23.120.206 | kv       | 7.1.0-2111-enterprise | 24.1222530942 | Cluster node |
      | 172.23.121.10  | kv       | 7.1.0-2111-enterprise | 8.74303091738 | Cluster node |
      +----------------+----------+-----------------------+---------------+--------------+
      

      5. This gets hung for a fair while(almost 35 mins)
      6. We check for any minidumps and do a cbcollect. No minidumps at this point.
      7. We stop the rebalance.
      8. Delete buckets.
      9. At this point I manually did a cbcollect.

      In case you are wondering why we did a cbcollect post deleting a bucket the reason is we expect minidumps after the test ends at step 6. We did not find any. However we hit into this issue during cleanup which is when we hit into this minidump. So this is not a norm

      grep CRITICAL on 172.23.104.186

      memcached.log.000000.txt:2022-01-20T14:38:18.396083-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log.000000.txt:2022-01-20T14:38:18.396173-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ThrowExceptionUnderflowPolicy current:0 arg:-143
      memcached.log.000000.txt:2022-01-20T14:38:18.396190-08:00 CRITICAL Exception thrown from:
      memcached.log.000000.txt:2022-01-20T14:38:18.396285-08:00 CRITICAL     #0  /opt/couchbase/bin/memcached() [0x400000+0x1399c0]
      memcached.log.000000.txt:2022-01-20T14:38:18.396323-08:00 CRITICAL     #1  /opt/couchbase/bin/memcached() [0x400000+0x432c08]
      memcached.log.000000.txt:2022-01-20T14:38:18.396355-08:00 CRITICAL     #2  /opt/couchbase/bin/memcached() [0x400000+0x43308f]
      memcached.log.000000.txt:2022-01-20T14:38:18.396386-08:00 CRITICAL     #3  /opt/couchbase/bin/memcached() [0x400000+0x3442f1]
      memcached.log.000000.txt:2022-01-20T14:38:18.396414-08:00 CRITICAL     #4  /opt/couchbase/bin/memcached() [0x400000+0x34e251]
      memcached.log.000000.txt:2022-01-20T14:38:18.396462-08:00 CRITICAL     #5  /opt/couchbase/bin/memcached() [0x400000+0x6aa622]
      memcached.log.000000.txt:2022-01-20T14:38:18.396514-08:00 CRITICAL     #6  /opt/couchbase/bin/memcached() [0x400000+0x6a7725]
      memcached.log.000000.txt:2022-01-20T14:38:18.396561-08:00 CRITICAL     #7  /opt/couchbase/bin/memcached() [0x400000+0x7fbbd0]
      memcached.log.000000.txt:2022-01-20T14:38:18.396614-08:00 CRITICAL     #8  /opt/couchbase/bin/memcached() [0x400000+0x7e650a]
      memcached.log.000000.txt:2022-01-20T14:38:18.396663-08:00 CRITICAL     #9  /opt/couchbase/bin/memcached() [0x400000+0x7feb89]
      memcached.log.000000.txt:2022-01-20T14:38:18.396705-08:00 CRITICAL     #10 /opt/couchbase/bin/memcached() [0x400000+0x6a73b4]
      memcached.log.000000.txt:2022-01-20T14:38:18.396791-08:00 CRITICAL     #11 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0xcdd40]
      memcached.log.000000.txt:2022-01-20T14:38:18.396813-08:00 CRITICAL     #12 /lib64/libpthread.so.0() [0x7f4242560000+0x7ea5]
      memcached.log.000000.txt:2022-01-20T14:38:18.396881-08:00 CRITICAL     #13 /lib64/libc.so.6(clone+0x6d) [0x7f423fdaa000+0xfe9fd]
      memcached.log.000000.txt:2022-01-20T14:38:18.397001-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log.000000.txt:2022-01-20T14:38:18.397139-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log.000000.txt:2022-01-20T14:38:18.830155-08:00 CRITICAL Detected previous crash
      memcached.log.000000.txt:2022-01-20T14:38:18.830357-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-2111). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/56e2ddf9-7596-440e-7ec3b3a2-89b60b80.dmp before terminating.
      memcached.log.000000.txt:2022-01-20T14:38:18.830380-08:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log.000000.txt:2022-01-20T14:38:18.830383-08:00 CRITICAL    #0  /opt/couchbase/bin/memcached() [0x400000+0x72cfd8]
      memcached.log.000000.txt:2022-01-20T14:38:18.830384-08:00 CRITICAL    #1  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x77e7aa]
      memcached.log.000000.txt:2022-01-20T14:38:18.830386-08:00 CRITICAL    #2  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x77eae8]
      memcached.log.000000.txt:2022-01-20T14:38:18.830388-08:00 CRITICAL    #3  /lib64/libpthread.so.0() [0x7f4242560000+0xf630]
      memcached.log.000000.txt:2022-01-20T14:38:18.830390-08:00 CRITICAL    #4  /lib64/libc.so.6(gsignal+0x37) [0x7f423fdaa000+0x363d7]
      memcached.log.000000.txt:2022-01-20T14:38:18.830392-08:00 CRITICAL    #5  /lib64/libc.so.6(abort+0x148) [0x7f423fdaa000+0x37ac8]
      memcached.log.000000.txt:2022-01-20T14:38:18.830393-08:00 CRITICAL    #6  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0x9963c]
      memcached.log.000000.txt:2022-01-20T14:38:18.830395-08:00 CRITICAL    #7  /opt/couchbase/bin/memcached() [0x400000+0x7374ab]
      memcached.log.000000.txt:2022-01-20T14:38:18.830396-08:00 CRITICAL    #8  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0xa48f6]
      memcached.log.000000.txt:2022-01-20T14:38:18.830492-08:00 CRITICAL    #9  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0xa4961]
      memcached.log.000000.txt:2022-01-20T14:38:18.830524-08:00 CRITICAL    #10 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0xa4bf4]
      memcached.log.000000.txt:2022-01-20T14:38:18.830529-08:00 CRITICAL    #11 /opt/couchbase/bin/memcached() [0x400000+0x139ad4]
      memcached.log.000000.txt:2022-01-20T14:38:18.830544-08:00 CRITICAL    #12 /opt/couchbase/bin/memcached() [0x400000+0x432c08]
      memcached.log.000000.txt:2022-01-20T14:38:18.830605-08:00 CRITICAL    #13 /opt/couchbase/bin/memcached() [0x400000+0x43308f]
      memcached.log.000000.txt:2022-01-20T14:38:18.830607-08:00 CRITICAL    #14 /opt/couchbase/bin/memcached() [0x400000+0x3442f1]
      memcached.log.000000.txt:2022-01-20T14:38:18.830608-08:00 CRITICAL    #15 /opt/couchbase/bin/memcached() [0x400000+0x34e251]
      memcached.log.000000.txt:2022-01-20T14:38:18.830609-08:00 CRITICAL    #16 /opt/couchbase/bin/memcached() [0x400000+0x6aa622]
      memcached.log.000000.txt:2022-01-20T14:38:18.830610-08:00 CRITICAL    #17 /opt/couchbase/bin/memcached() [0x400000+0x6a7725]
      memcached.log.000000.txt:2022-01-20T14:38:18.830612-08:00 CRITICAL    #18 /opt/couchbase/bin/memcached() [0x400000+0x7fbbd0]
      memcached.log.000000.txt:2022-01-20T14:38:18.830613-08:00 CRITICAL    #19 /opt/couchbase/bin/memcached() [0x400000+0x7e650a]
      memcached.log.000000.txt:2022-01-20T14:38:18.830614-08:00 CRITICAL    #20 /opt/couchbase/bin/memcached() [0x400000+0x7feb89]
      memcached.log.000000.txt:2022-01-20T14:38:18.830616-08:00 CRITICAL    #21 /opt/couchbase/bin/memcached() [0x400000+0x6a73b4]
      memcached.log.000000.txt:2022-01-20T14:38:18.830617-08:00 CRITICAL    #22 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4240692000+0xcdd40]
      memcached.log.000000.txt:2022-01-20T14:38:18.830619-08:00 CRITICAL    #23 /lib64/libpthread.so.0() [0x7f4242560000+0x7ea5]
      memcached.log.000000.txt:2022-01-20T14:38:18.830621-08:00 CRITICAL    #24 /lib64/libc.so.6(clone+0x6d) [0x7f423fdaa000+0xfe9fd]
      

      bt of 56e2ddf9-7596-440e-7ec3b3a2-89b60b80.dmp on 172.23.104.186

      (gdb) bt
      #0  0x00007f423fde03d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
      #1  0x00007f423fde1ac8 in __GI_abort () at abort.c:90
      #2  0x00007f424072b63c in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #3  0x0000000000b374ab in backtrace_terminate_handler() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:88
      #4  0x00007f42407368f6 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #5  0x00007f4240736961 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #6  0x00007f4240736bf4 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #7  0x0000000000539ad4 in cb::throwWithTrace<std::underflow_error> (exception=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/boost.exploded/include/boost/exception/info.hpp:129
      #8  0x0000000000832c08 in cb::ThrowExceptionUnderflowPolicy<unsigned long>::underflow () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/char_traits.h:322
      #9  0x000000000083308f in store (desired=18446744073709551473, this=0x7f41cca15960) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/atomic_base.h:116
      #10 operator= (val=18446744073709551473, this=0x7f41cca15960) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/include/platform/non_negative_counter.h:360
      #11 setNumTotalItems (totalItems=18446744073709551473, this=0x7f41cca15500) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_vb.cc:423
      #12 EPVBucket::setNumTotalItems(KVStoreIface&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_vb.cc:1179
      #13 0x00000000007442f1 in Warmup::estimateDatabaseItemCount (this=0x7f41cd853a00, shardId=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvshard.h:72
      #14 0x000000000074e251 in WarmupEstimateDatabaseItemCount::run (this=0x7f41dbfe6190) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/warmup.cc:266
      #15 0x0000000000aaa622 in GlobalTask::execute() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:68
      #16 0x0000000000aa7725 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7f42329ed840) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:189
      #17 0x0000000000bfbbd0 in operator() (this=0x7f42329ed840) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
      #18 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=this@entry=0x7f423eafe400, thread=..., 
          task=task@entry=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached-7.1.0-2111.x86_64.debug, CU 0xa3160b8, DIE 0xa399fed>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
      #19 0x0000000000be650a in folly::CPUThreadPoolExecutor::threadRun (this=0x7f423eafe400, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
      #20 0x0000000000bfeb89 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>, 
          __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      #21 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      #22 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      #23 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      #24 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
      #25 0x0000000000aa73b4 in operator() (this=0x7f423eabfac0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #26 operator() (__closure=0x7f423eabfac0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #27 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
      #28 0x00007f424075fd40 in execute_native_thread_routine () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #29 0x00007f4242567ea5 in start_thread (arg=0x7f42329ff700) at pthread_create.c:307
      #30 0x00007f423fea89fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      (gdb) 
      

      The cluster was also messed up post this with lot of nodes down.

      cbcollect_info attached. This issue was not seen on 7.1.0-2021.

      Attachments

        1. bt_full.txt
          13 kB
        2. info_threads.txt
          6 kB
        3. thread_apply_all_bt.txt
          133 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty