Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42016

[Collections] - minidumps seen during Multi node graceful failover + rebalance out + collection CRUD

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      ./testrunner -i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_rebalance_out,nodes_init=5,nodes_failover=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud
      

      Steps to Repro
      1. Create 5 node cluster
      2020-10-13 00:00:37,207 | test | INFO | pool-3-thread-6 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 None <--- IN —
      172.23.120.206 None <--- IN —
      172.23.104.186 None <--- IN —
      172.23.121.10 None <--- IN —

      ------------------------------------

      2. Create buckets/scope/collections/data
      2020-10-13 00:12:21,321 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      --------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      --------------------------------------------------------------------------

      bucket1 couchbase 3 none 0 3000 1048576000 253857232 359987046
      bucket2 ephemeral 3 none 0 3000 1048576000 363499152 170
      default couchbase 3 none 0 500000 10485760000 718104144 583601498

      --------------------------------------------------------------------------

      3. Start data load again with CRUD on collections

      4. Do a graceful failover of 2 nodes(172.23.104.186 and 172.23.121.10)

      2020-10-13 00:12:27,401 | test  | INFO    | MainThread | [collections_rebalance:rebalance_operation:157] Starting rebalance operation of type : graceful_failover_rebalance_out
      2020-10-13 00:14:27,693 | test  | INFO    | MainThread | [collections_rebalance:wait_for_failover_or_assert:129] 1 nodes failed over as expected in 0.0490000247955 seconds
      2020-10-13 00:16:48,086 | test  | INFO    | MainThread | [collections_rebalance:wait_for_failover_or_assert:129] 2 nodes failed over as expected in 20.1369998455 seconds
      

      5. Start rebalance out of those 2 nodes. Rebalance fails and minidumps are seen on 172.23.120.206.

      2020-10-13 00:19:09,019 | test  | ERROR   | pool-3-thread-24 | [rest_client:_rebalance_status_and_progress:1479] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
      

      grep CRITICAL on 172.23.120.206

      [root@localhost logs]# grep CRITICAL memcached.log.0000*
      memcached.log.000041.txt:2020-10-13T00:19:04.665334-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log.000041.txt:2020-10-13T00:19:04.665384-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): void Collections::VB::Manifest::DroppedCollections::remove(CollectionID, uint64_t) The collection cannot be found collection:0xcd seqno:1092
      memcached.log.000041.txt:2020-10-13T00:19:04.854355-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3383). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/761d0348-ae89-43ef-b46ea584-ca2d7f1e.dmp before terminating.
      memcached.log.000041.txt:2020-10-13T00:19:04.854370-07:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log.000041.txt:2020-10-13T00:19:04.854513-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x19741d]
      memcached.log.000041.txt:2020-10-13T00:19:04.854534-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x1ac3aa]
      memcached.log.000041.txt:2020-10-13T00:19:04.854544-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1ac6e8]
      memcached.log.000041.txt:2020-10-13T00:19:04.854550-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f39fb9f4000+0xf630]
      memcached.log.000041.txt:2020-10-13T00:19:04.854569-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f39fb626000+0x36387]
      memcached.log.000041.txt:2020-10-13T00:19:04.854585-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f39fb626000+0x37a78]
      memcached.log.000041.txt:2020-10-13T00:19:04.854618-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f39fc129000+0x91195]
      memcached.log.000041.txt:2020-10-13T00:19:04.854635-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x1a6d72]
      memcached.log.000041.txt:2020-10-13T00:19:04.854651-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8ef86]
      memcached.log.000041.txt:2020-10-13T00:19:04.854670-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8efd1]
      memcached.log.000041.txt:2020-10-13T00:19:04.854689-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8f213]
      memcached.log.000041.txt:2020-10-13T00:19:04.854699-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x2a29cc]
      memcached.log.000041.txt:2020-10-13T00:19:04.854706-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x2a53cf]
      memcached.log.000041.txt:2020-10-13T00:19:04.854711-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x275003]
      memcached.log.000041.txt:2020-10-13T00:19:04.854889-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x122978]
      memcached.log.000041.txt:2020-10-13T00:19:04.854896-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1264fd]
      memcached.log.000041.txt:2020-10-13T00:19:04.854903-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1806bc]
      memcached.log.000041.txt:2020-10-13T00:19:04.854908-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x181899]
      memcached.log.000041.txt:2020-10-13T00:19:04.854913-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1845c3]
      memcached.log.000041.txt:2020-10-13T00:19:04.854918-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x861bf]
      memcached.log.000041.txt:2020-10-13T00:19:04.854924-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f39fe1ac000+0x10947]
      memcached.log.000041.txt:2020-10-13T00:19:04.854930-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f39fb9f4000+0x7ea5]
      memcached.log.000041.txt:2020-10-13T00:19:04.854957-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f39fb626000+0xfe8dd]
      

      bt full of 761d0348-ae89-43ef-b46ea584-ca2d7f1e.dmp on 172.23.120.206
      See bt_full_multi_node_graceful_rebalance_out.txt

      cbcollect_info attached. This test passed on 7.0.0-3342.

      Attachments

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty