Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43923

[Collections] Minidumps seen during collection CRUD + durability dataload + graceful failover + full recovery + rebalance

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • No
    • KV-Engine 2021-Feb

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.79048.ini GROUP=failover_with_collection_crud_durability_MAJORITY,rerun=False,upgrade_version=7.0.0-4325 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=full,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud_durability_MAJORITY
      

      Steps to Repro
      1) Create a 5 node cluster
      2021-01-27 21:59:27,921 | test | INFO | pool-4-thread-6 | [table_view:display:72] Rebalance Overview
      ----------------------++-------------

      Nodes Services Status

      ----------------------++-------------

      172.23.105.52 kv Cluster node
      172.23.105.53 None <--- IN —
      172.23.105.59 None <--- IN —
      172.23.105.64 None <--- IN —
      172.23.105.79 None <--- IN —

      ----------------------++-------------

      2) Create buckets/scopes/collections/data
      -----------------+-----------------------------------------------------+----------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -----------------+-----------------------------------------------------+----------

      bucket1 couchbase 2 none 0 3000 1048576000 166419944 267464363
      bucket2 ephemeral 2 none 0 3000 1048576000 244948568 170
      default couchbase 2 none 0 500000 10485760000 584606712 411196169

      -----------------+-----------------------------------------------------+----------

      3) Graceful failover node of 172.23.105.79

      2021-01-27 22:05:54,640 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:213] 1 nodes failed over as expected in 0.029000043869 seconds

       
      4) Do full recovery + Rebalance

      2021-01-27 22:06:26,605 | test | WARNING | MainThread | [rest_client:get_nodes:1710] 172.23.105.79 - Node not part of cluster inactiveFailed

       
      We see 3 crashes.
      On 172.23.105.53
      8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp
      8e67c9b4-4a55-4b5b-ce42a0bf-7be225d5.dmp

      On 172.23.105.59
      cdb0d43e-32a6-4d67-eaa412ad-3fe31243.dmp

      grep CRITICAL for 8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp on 172.23.105.53

      [ns_server:info,2021-01-27T21:39:33.390-08:00,babysitter_of_ns_1@cb.local:<0.249.0>:ns_port_server:log:224]memcached<0.249.0>: WARNING: Logging before InitGoogleLogging() is written to STDERR
      memcached<0.249.0>: W0127 21:39:33.188218 99170 HazptrDomain.h:671] Using the default inline executor for asynchronous reclamation may be susceptible to deadlock if the current thread happens to hold a resource needed by the deleter of a reclaimable object
       
      [ns_server:info,2021-01-27T21:39:39.493-08:00,babysitter_of_ns_1@cb.local:<0.249.0>:ns_port_server:log:224]memcached<0.249.0>: 2021-01-27T21:39:39.452151-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4325). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp before terminating.
      memcached<0.249.0>: 2021-01-27T21:39:39.452182-08:00 CRITICAL Stack backtrace of crashed thread:
      memcached<0.249.0>: 2021-01-27T21:39:39.452412-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x145bbd]
      memcached<0.249.0>: 2021-01-27T21:39:39.452424-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15b3fa]
      memcached<0.249.0>: 2021-01-27T21:39:39.452434-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x15b738]
      memcached<0.249.0>: 2021-01-27T21:39:39.452441-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7fc7b8fd3000+0xf5d0]
      memcached<0.249.0>: 2021-01-27T21:39:39.452454-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x78d66]
      memcached<0.249.0>: 2021-01-27T21:39:39.452462-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x77b1a]
      memcached<0.249.0>: 2021-01-27T21:39:39.452469-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x80fd8]
      memcached<0.249.0>: 2021-01-27T21:39:39.452476-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x8b382]
      memcached<0.249.0>: 2021-01-27T21:39:39.452486-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x188fbb]
      memcached<0.249.0>: 2021-01-27T21:39:39.452495-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x16dc13]
      memcached<0.249.0>: 2021-01-27T21:39:39.452502-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x167c92]
      memcached<0.249.0>: 2021-01-27T21:39:39.452513-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2e71d6]
      memcached<0.249.0>: 2021-01-27T21:39:39.452523-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2cf6ca]
      memcached<0.249.0>: 2021-01-27T21:39:39.452532-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2ea2f9]
      memcached<0.249.0>: 2021-01-27T21:39:39.452541-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x1660d3]
      memcached<0.249.0>: 2021-01-27T21:39:39.452576-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc7b9708000+0xb9dcf]
      memcached<0.249.0>: 2021-01-27T21:39:39.452582-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7fc7b8fd3000+0x7dd5]
      memcached<0.249.0>: 2021-01-27T21:39:39.452615-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fc7b8c06000+0xfdead]
      

      See bt_full_all_threads.txt for 8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp on 172.23.105.53. Attaching cbcollect.

      This test worked fine on 7.0.0-4291.

      Attachments

        1. 3b61ddfd-4093-40f6-2881ddbc-b152fe13_bt_full.txt
          12 kB
        2. bt_full_all_threads.txt
          66 kB
        3. bt_full.txt
          13 kB
        4. info_threads.txt
          3 kB
        5. test_2.log
          175 kB
        6. test.log
          433 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty