Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44685

[Collections] - mindumps seen during Rebalance in + collection CRUD

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • Yes
    • KV-Engine 2021-March

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=True,quota_percent=95,crash_warning=True,rebalance_moves_per_node=64 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=3,nodes_in=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=before,scrape_interval=5,rebalance_moves_per_node=32,quota_percent=80,skip_validations=False,GROUP=rebalance_with_collection_crud'
      

      Steps to Repro
      1) Create a 3 node cluster
      2021-03-02 00:17:36,871 | test | INFO | pool-1-thread-6 | [table_view:display:72] Rebalance Overview
      ---------------------------------------------------------------------

      Nodes Services Version CPU Status

      ---------------------------------------------------------------------

      172.23.98.196 kv 7.0.0-4574-enterprise 10.3232272613 Cluster node
      172.23.98.195 None     <--- IN —
      172.23.121.10 None     <--- IN —

      ---------------------------------------------------------------------

      2) Create buckets/scopes/collections/data
      2021-03-02 00:22:36,851 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -------------------------------------------------------------------------

      bucket1 couchbase 3 none 0 3000 629145600 143988160 329478219
      bucket2 ephemeral 3 none 0 3000 629145600 313926472 102
      default couchbase 3 none 0 500000 6291456000 524102632 596750032

      -------------------------------------------------------------------------

      3) Adding node fails as shown below which is because of MB-44012.

      2021-03-02 00:22:49,375 | test  | INFO    | MainThread | [cluster_ready_functions:set_rebalance_moves_per_nodes:119] Changed Rebalance settings: {u'rebalanceMovesPerNode': 64}
      2021-03-02 00:23:26,615 | test  | ERROR   | pool-1-thread-24 | [rest_client:_http_request:747] POST http://172.23.98.196:8091/controller/addNode body: hostname=http%3A%2F%2F172.23.104.186%3A8091&password=password&user=Administrator headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 400 reason: unknown ["Join completion call failed. Got HTTP status 500 from REST call post to http://172.23.104.186:8091/completeJoin. Body was: \"[\\\"Unexpected server error, request logged.\\\"]\""] auth: Administrator:password
      

      2021-03-02 00:23:26,858 | test  | ERROR   | pool-1-thread-24 | [task:call:242] Error adding node: 172.23.104.186 to the cluster:172.23.98.196 - ["Join completion call failed. Got HTTP status 500 from REST call post to http://172.23.104.186:8091/completeJoin. Body was: \"[\\\"Unexpected server error, request logged.\\\"]\""]
      

      We continued to do CRUD on collections when we see ae8d6778-2a62-426c-7fdd22bd-95275336.dmp on 172.23.121.10.

      grep CRITICAL on 172.23.121.10

      [root@localhost logs]# grep CRITICAL memcached.log.0000*
      memcached.log.000016.txt:2021-03-02T00:21:57.107143-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log.000016.txt:2021-03-02T00:21:57.107243-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): std::bad_alloc
      memcached.log.000016.txt:2021-03-02T00:21:57.737236-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4574). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/ae8d6778-2a62-426c-7fdd22bd-95275336.dmp before terminating.
      memcached.log.000016.txt:2021-03-02T00:21:57.737279-08:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log.000016.txt:2021-03-02T00:21:57.737590-08:00 CRITICAL     #0  /opt/couchbase/bin/memcached() [0x400000+0x14cc4d]
      memcached.log.000016.txt:2021-03-02T00:21:57.737617-08:00 CRITICAL     #1  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x16304a]
      memcached.log.000016.txt:2021-03-02T00:21:57.737640-08:00 CRITICAL     #2  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x163388]
      memcached.log.000016.txt:2021-03-02T00:21:57.737735-08:00 CRITICAL     #3  /lib64/libpthread.so.0() [0x7f516953b000+0xf630]
      memcached.log.000016.txt:2021-03-02T00:21:57.737786-08:00 CRITICAL     #4  /lib64/libc.so.6(gsignal+0x37) [0x7f516916d000+0x36387]
      memcached.log.000016.txt:2021-03-02T00:21:57.737828-08:00 CRITICAL     #5  /lib64/libc.so.6(abort+0x148) [0x7f516916d000+0x37a78]
      memcached.log.000016.txt:2021-03-02T00:21:57.737886-08:00 CRITICAL     #6  /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f5169c70000+0x91195]
      memcached.log.000016.txt:2021-03-02T00:21:57.737912-08:00 CRITICAL     #7  /opt/couchbase/bin/memcached() [0x400000+0x15c972]
      memcached.log.000016.txt:2021-03-02T00:21:57.737960-08:00 CRITICAL     #8  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f5169c70000+0x8ef86]
      memcached.log.000016.txt:2021-03-02T00:21:57.738014-08:00 CRITICAL     #9  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f5169c70000+0x8efd1]
      memcached.log.000016.txt:2021-03-02T00:21:57.738081-08:00 CRITICAL     #10 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f5169c70000+0xb9dfe]
      memcached.log.000016.txt:2021-03-02T00:21:57.738106-08:00 CRITICAL     #11 /lib64/libpthread.so.0() [0x7f516953b000+0x7ea5]
      memcached.log.000016.txt:2021-03-02T00:21:57.738161-08:00 CRITICAL     #12 /lib64/libc.so.6(clone+0x6d) [0x7f516916d000+0xfe8dd]
      

      cbcollect_info attached. This was not seen on 7.0.0-4554.

      Attachments

        1. bt_full.txt
          4 kB
        2. consoleText_new.txt
          148 kB
        3. info_threads.txt
          5 kB
        4. Screenshot 2021-03-02 at 16.14.16.png
          Screenshot 2021-03-02 at 16.14.16.png
          137 kB
        5. Screenshot 2021-03-02 at 16.30.03.png
          Screenshot 2021-03-02 at 16.30.03.png
          137 kB
        6. Screenshot 2021-03-02 at 16.34.02.png
          Screenshot 2021-03-02 at 16.34.02.png
          45 kB
        7. Screenshot 2021-03-02 at 16.34.32.png
          Screenshot 2021-03-02 at 16.34.32.png
          42 kB
        8. Screenshot 2021-03-09 at 12.21.21.png
          Screenshot 2021-03-09 at 12.21.21.png
          149 kB
        9. Screenshot 2021-03-09 at 12.23.14.png
          Screenshot 2021-03-09 at 12.23.14.png
          157 kB
        10. thread_apply_all_bt.txt
          86 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty