Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39864

[Collections] - Underflow in vbucket_manifest

    XMLWordPrintable

Details

    Description

      Script to Repro

      ./testrunner -i /tmp/win10-bucket-ops.ini sdk_client_pool=True,rerun=False,crash_warning=True,quota_percent=90,get-cbcollect-info=False -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=2,recovery_type=delta,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud
      

      Steps to Repro
      1) Create a 4 node cluster
      2020-06-10 02:32:24,578 | test | INFO | pool-14-thread-7 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 None <--- IN —
      172.23.121.10 None <--- IN —
      172.23.104.186 None <--- IN —
      172.23.120.201 None <--- IN —

      ------------------------------------

      2) Create buckets/scope/collections/documents
      ----------------------------------------------------------------+

      Bucket Type Replicas TTL Items RAM Quota RAM Used Disk Used

      ----------------------------------------------------------------+

      bucket1 membase 3 0 3000 1048576000 220907152 273142632
      bucket2 ephemeral 3 0 3000 1048576000 329786832 170
      default membase 3 0 500000 10485760000 767172112 543442137

      ----------------------------------------------------------------+

      3) Start hard failover of 2 nodes((172.23.104.186 and 172.23.120.201)) and start data load(collection/scopes drop/create) while failover is happening
      2020-06-10 02:48:04,164 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:112] Starting rebalance operation of type : hard_failover_recovery
      2020-06-10 02:50:22,744 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:84] 1 nodes failed over as expected in 0.103999853134 seconds
      2020-06-10 02:52:36,690 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:84] 2 nodes failed over as expected in 0.102999925613 seconds

      4) After failover completes , do a data load again. Wait for data load to complete.

      5) Do a delta recovery 2 nodes and start rebalance(172.23.104.186 and 172.23.120.201)
      2020-06-10 02:55:21,378 | test | WARNING | MainThread | [rest_client:get_nodes:1671] 172.23.104.186 - Node not part of cluster inactiveFailed
      2020-06-10 02:55:21,378 | test | WARNING | MainThread | [rest_client:get_nodes:1671] 172.23.120.201 - Node not part of cluster inactiveFailed
      2020-06-10 02:55:31,907 | test | INFO | MainThread | [bucket_ready_functions:perform_tasks_from_spec:4433] Performing scope/collection specific operations
      2020-06-10 02:55:32,322 | test | INFO | pool-14-thread-5 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 kv Cluster node
      172.23.104.186 kv Cluster node
      172.23.120.201 kv Cluster node
      172.23.121.10 kv Cluster node

      ------------------------------------

      Rebalance fails and coredumps are seen on 172.23.120.201. The Coredump we are interested in is 94edd15f-8a38-4975-5a86828f-a015955e.dmp. Others are just dup of MB-39532.

      From log

      memcached<0.114.0>: 2020-06-10T02:55:52.661996-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ThrowExceptionUnderflowPolicy current:0 arg:1
      memcached<0.114.0>: terminate called after throwing an instance of 'std::underflow_error'
      memcached<0.114.0>:   what():  ThrowExceptionUnderflowPolicy current:0 arg:1
      memcached<0.114.0>: 2020-06-10T02:55:52.787137-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2309). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/94edd15f-8a38-4975-5a86828f-a015955e.dmp before terminating.
      memcached<0.114.0>: 2020-06-10T02:55:52.787146-07:00 CRITICAL Stack backtrace of crashed thread:
      memcached<0.114.0>: 2020-06-10T02:55:52.787262-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x13b80d]
      memcached<0.114.0>: 2020-06-10T02:55:52.787271-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x150b1a]
      memcached<0.114.0>: 2020-06-10T02:55:52.787276-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x150e58]
      memcached<0.114.0>: 2020-06-10T02:55:52.787281-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f07ad9f1000+0xf630]
      memcached<0.114.0>: 2020-06-10T02:55:52.787299-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f07ad623000+0x36387]
      memcached<0.114.0>: 2020-06-10T02:55:52.787314-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f07ad623000+0x37a78]
      memcached<0.114.0>: 2020-06-10T02:55:52.787333-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f07ae126000+0x91195]
      memcached<0.114.0>: 2020-06-10T02:55:52.787339-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x14c012]
      memcached<0.114.0>: 2020-06-10T02:55:52.787353-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8ef86]
      memcached<0.114.0>: 2020-06-10T02:55:52.787364-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8efd1]
      memcached<0.114.0>: 2020-06-10T02:55:52.787375-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8f213]
      memcached<0.114.0>: 2020-06-10T02:55:52.787381-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x58174]
      memcached<0.114.0>: 2020-06-10T02:55:52.787387-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x22909b]
      memcached<0.114.0>: 2020-06-10T02:55:52.787392-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x1f23a5]
      memcached<0.114.0>: 2020-06-10T02:55:52.787397-07:00 CRITICAL     /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x119db]
      memcached<0.114.0>: 2020-06-10T02:55:52.787399-07:00 CRITICAL     /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x113b1]
      memcached<0.114.0>: 2020-06-10T02:55:52.787401-07:00 CRITICAL     /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x116d4]
      memcached<0.114.0>: 2020-06-10T02:55:52.787403-07:00 CRITICAL     /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x12349]
      memcached<0.114.0>: 2020-06-10T02:55:52.787407-07:00 CRITICAL     /opt/couchbase/bin/../lib/libcouchstore.so(couchstore_save_documents_and_callback+0x850) [0x7f07b160c000+0x268c0]
      memcached<0.114.0>: 2020-06-10T02:55:52.787412-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x2013d8]
      memcached<0.114.0>: 2020-06-10T02:55:52.787415-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x201e9a]
      memcached<0.114.0>: 2020-06-10T02:55:52.787420-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x202535]
      memcached<0.114.0>: 2020-06-10T02:55:52.787425-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0xea552]
      memcached<0.114.0>: 2020-06-10T02:55:52.787430-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0xee819]
      memcached<0.114.0>: 2020-06-10T02:55:52.787435-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x142d7c]
      memcached<0.114.0>: 2020-06-10T02:55:52.787438-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x143f49]
      memcached<0.114.0>: 2020-06-10T02:55:52.787441-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x146da3]
      memcached<0.114.0>: 2020-06-10T02:55:52.787443-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x13d70f]
      memcached<0.114.0>: 2020-06-10T02:55:52.787448-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f07b03ab000+0x10777]
      memcached<0.114.0>: 2020-06-10T02:55:52.787452-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f07ad9f1000+0x7ea5]
      memcached<0.114.0>: 2020-06-10T02:55:52.787475-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f07ad623000+0xfe8dd]
      

      Backtrace
      See bt_full.txt for bt full. Was running into character limits.

      It seems to have little in common with MB-39573 even though both have exception being thrown by ThrowExceptionUnderflowPolicy.

      cbcollect_info attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty