Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42275

[Collections] - Memcached minidumps seen during rebalance in - out + CRUD on collections

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      ./testrunner -i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in_out,nodes_init=4,nodes_in=1,nodes_out=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=rebalance_with_collection_crud
      

      Steps to Repro
      1. Create a 4 node cluster.
      2020-10-25 05:43:42,042 | test | INFO | pool-9-thread-6 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 None <--- IN —
      172.23.120.206 None <--- IN —
      172.23.104.186 None <--- IN —

      ------------------------------------

      2. Create buckets/scopes/collections/data
      2020-10-25 05:55:00,158 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -------------------------------------------------------------------------

      bucket1 couchbase 3 none 0 3000 838860800 248462272 312784905
      bucket2 ephemeral 3 none 0 3000 838860800 361099936 136
      default couchbase 3 none 0 500000 8388608000 793135744 590006946

      -------------------------------------------------------------------------

      3. Start collections data load

      4. Add 1 node(172.23.121.10 ) , remove 2 nodes(172.23.104.186,172.23.120.206) and start rebalance.

      2020-10-25 05:55:13,145 | test | INFO | pool-9-thread-25 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 kv Cluster node
      172.23.104.186 [u'kv'] — OUT --->
      172.23.120.206 [u'kv'] — OUT --->
      172.23.121.10 kv Cluster node

      ------------------------------------

      5. Rebalance completes fine but the cluster cleanup/bucket cleanup fails.

      + 172.23.121.10 : Looking for CRITICAL messages in log+

      2020-10-25 06:10:12,957 | test  | INFO    | MainThread | [basetestcase:check_coredump_exist:728] 172.23.121.10 : Looking for CRITICAL messages in log
      172.23.121.10 : Found message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000216.txt
      2020-10-25T05:41:50.470564-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3515). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/434cc0e2-cbfa-4b86-4d22b7b0-9e996c6a.dmp before terminating.
      2020-10-25T05:41:50.470607-07:00 CRITICAL Stack backtrace of crashed thread:
      2020-10-25T05:41:50.490535-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x198a6d]
      2020-10-25T05:41:50.490594-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x1adaaa]
      2020-10-25T05:41:50.490626-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1adde8]
      2020-10-25T05:41:50.490644-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f52feefe000+0xf630]
      2020-10-25T05:41:50.490659-07:00 CRITICAL     /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7f52feefe000+0x9d00]
      2020-10-25T05:41:50.490682-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0xc1eef]
      2020-10-25T05:41:50.490703-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0xed5a8]
      2020-10-25T05:41:50.490722-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0xd368b]
      2020-10-25T05:41:50.490742-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0x9fc7b]
      2020-10-25T05:41:50.490769-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0x1a02e1]
      2020-10-25T05:41:50.490790-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0x185853]
      2020-10-25T05:41:50.490810-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f5302c1e000+0x86edf]
      2020-10-25T05:41:50.490826-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f53016b6000+0x10947]
      2020-10-25T05:41:50.490840-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f52feefe000+0x7ea5]
      

      bt full of 434cc0e2-cbfa-4b86-4d22b7b0-9e996c6a.dmp on 172.23.121.10

      (gdb) bt full
      #0  0x00007f52fef07d00 in pthread_mutex_lock () from /lib64/libpthread.so.0
      No symbol table info available.
      #1  0x00007f5302cdfeef in __gthread_mutex_lock (__mutex=0x58) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
      No locals.
      #2  lock (this=0x58) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103
      No locals.
      #3  lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162
      No locals.
      #4  BackfillManager::wakeUpTask (this=0x0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/backfill-manager.cc:419
              lh = {_M_device = @0x58}
      #5  0x00007f5302d0b5a8 in DcpProducer::notifyBackfillManager (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/producer.cc:1402
      No locals.
      #6  0x00007f5302cf168b in DcpConnMap::notifyBackfillManagerTasks (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:463
              producer = <optimized out>
              handle = {<folly::LockedPtrBase<folly::Synchronized<ConnStore::CookieToConnMapHandle, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7f52ec0db000}, static AllowsConcurrentAccess = false}
      #7  0x00007f5302cbdc7b in CheckpointVisitor::complete (this=0x7f52ec122c80) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/checkpoint_visitor.cc:72
              inverse = false
              this = 0x7f52ec122c80
              inverse = false
      #8  0x00007f5302dbe2e1 in VBCBAdaptor::run (this=0x7f52ec128750) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kv_bucket.cc:2378
              id = 1024
      #9  0x00007f5302da3853 in GlobalTask::execute (this=0x7f52ec128750) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
              guard = {previous = 0x0}
      #10 0x00007f5302ca4edf in CB3ExecutorThread::run (this=0x7f52fd5df840) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc:174
              curTaskDescr = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                  _M_p = 0x7f52fd5ad4e0 <Address 0x7f52fd5ad4e0 out of bounds>}, _M_string_length = 38, {_M_local_buf = "&\000\000\000\000\000\000\000pressor", _M_allocated_capacity = 38}}
              woketime = <optimized out>
              scheduleOverhead = <optimized out>
              again = <optimized out>
              runtime = <optimized out>
              q = <optimized out>
              tick = 118 'v'
              guard = {engine = 0x0}
      #11 0x00007f53016c6947 in run (this=0x7f52f40dff00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
      No locals.
      #12 platform_thread_wrap (arg=0x7f52f40dff00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
              context = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7f52f40dff00}, <No data fields>}, <No data fields>}}}
      #13 0x00007f52fef05ea5 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #14 0x00007f52fec2e8dd in clone () from /lib64/libc.so.6
      No symbol table info available.
      (gdb) 
      #0  0x00007f52fef07d00 in pthread_mutex_lock () from /lib64/libpthread.so.0
      No symbol table info available.
      #1  0x00007f5302cdfeef in __gthread_mutex_lock (__mutex=0x58) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
      No locals.
      #2  lock (this=0x58) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103
      No locals.
      #3  lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162
      No locals.
      #4  BackfillManager::wakeUpTask (this=0x0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/backfill-manager.cc:419
              lh = {_M_device = @0x58}
      #5  0x00007f5302d0b5a8 in DcpProducer::notifyBackfillManager (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/producer.cc:1402
      No locals.
      #6  0x00007f5302cf168b in DcpConnMap::notifyBackfillManagerTasks (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:463
              producer = <optimized out>
              handle = {<folly::LockedPtrBase<folly::Synchronized<ConnStore::CookieToConnMapHandle, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7f52ec0db000}, static AllowsConcurrentAccess = false}
      #7  0x00007f5302cbdc7b in CheckpointVisitor::complete (this=0x7f52ec122c80) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/checkpoint_visitor.cc:72
              inverse = false
              this = 0x7f52ec122c80
              inverse = false
      #8  0x00007f5302dbe2e1 in VBCBAdaptor::run (this=0x7f52ec128750) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kv_bucket.cc:2378
              id = 1024
      #9  0x00007f5302da3853 in GlobalTask::execute (this=0x7f52ec128750) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
              guard = {previous = 0x0}
      #10 0x00007f5302ca4edf in CB3ExecutorThread::run (this=0x7f52fd5df840) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc:174
              curTaskDescr = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                  _M_p = 0x7f52fd5ad4e0 <Address 0x7f52fd5ad4e0 out of bounds>}, _M_string_length = 38, {_M_local_buf = "&\000\000\000\000\000\000\000pressor", _M_allocated_capacity = 38}}
              woketime = <optimized out>
              scheduleOverhead = <optimized out>
              again = <optimized out>
              runtime = <optimized out>
              q = <optimized out>
              tick = 118 'v'
              guard = {engine = 0x0}
      #11 0x00007f53016c6947 in run (this=0x7f52f40dff00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
      No locals.
      #12 platform_thread_wrap (arg=0x7f52f40dff00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
              context = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7f52f40dff00}, <No data fields>}, <No data fields>}}}
      #13 0x00007f52fef05ea5 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #14 0x00007f52fec2e8dd in clone () from /lib64/libc.so.6
      No symbol table info available.
      (gdb) 
      

      cbcollect_info attached.
      This test last passed on 7.0.0-3435.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty