Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41332

[Collections] BackfillManager::wakeUpTask (this=0x0) at ...engines/ep/src/dcp/backfill-manager.cc:419

    XMLWordPrintable

Details

    Description

      Script to repro

      ./testrunner -i /tmp/win10-bucket-ops.ini rerun=False,crash_warning=True,quota_percent=95 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,update_replica=True,updated_num_replicas=3,bucket_spec=multi_bucket.buckets_all_membase_for_rebalance_tests_more_collections,data_load_stage=during,data_load_spec=volume_test_load_with_CRUD_on_collections,skip_validations=False,override_spec_params=replicas,replicas=0,GROUP=replica_update_with_collection_crud
      

      Steps to Repro
      1. Create a 5 node cluster
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 None <--- IN —
      172.23.120.206 None <--- IN —
      172.23.104.186 None <--- IN —
      172.23.121.10 None <--- IN —

      ------------------------------------

      2. Create Buckets/Scopes/Collections/Data
      2020-09-07 15:11:31,572 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      --------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      --------------------------------------------------------------------------

      bucket1 couchbase 0 none 0 3000 1048576000 67044512 62719186
      bucket2 couchbase 0 none 0 3000 1048576000 67046608 93422804
      default couchbase 0 none 0 500000 10485760000 157943456 127608381

      --------------------------------------------------------------------------

      3. Start CRUD on collections

      4. Update bucket replicas to 3 and start rebalance out.
      2020-09-07 15:11:37,358 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:178] Updating all the bucket replicas to 3
      2020-09-07 15:11:37,358 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:157] Starting rebalance operation of type : rebalance_out
      2020-09-07 15:11:37,950 | test | INFO | pool-23-thread-8 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 kv Cluster node
      172.23.104.186 [u'kv'] — OUT --->
      172.23.120.206 kv Cluster node
      172.23.121.10 [u'kv'] — OUT --->

      ------------------------------------

      5. Rebalance completes successfully. However Coredump is seen after rebalance completes.

      grep of CRITICAL memcached

      [user:info,2020-09-07T14:24:05.844-07:00,ns_1@172.23.121.10:<0.17939.1>:ns_log:crash_consumption_loop:69]Service 'memcached' exited with status 139. Restarting. Messages:
      2020-09-07T14:24:05.666235-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xba2af]
      2020-09-07T14:24:05.666254-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xe55e8]
      2020-09-07T14:24:05.666273-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xcac0b]
      2020-09-07T14:24:05.666294-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x1b0751]
      2020-09-07T14:24:05.666314-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x192d51]
      2020-09-07T14:24:05.666334-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x177fe3]
      2020-09-07T14:24:05.666358-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x7fb3f]
      2020-09-07T14:24:05.666372-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f9fa3b90000+0x10777]
      2020-09-07T14:24:05.666386-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f9fa11d6000+0x7ea5]
      2020-09-07T14:24:05.666441-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f9fa0e08000+0xfe8dd]
      

      172.23.121.10 : Stack Trace of first crash: c25ef48b-2cfb-49ea-4fb7fe87-b0dc087f.dmp

      (gdb) bt full
      #0  0x00007f9fa11dfd00 in pthread_mutex_lock () from /lib64/libpthread.so.0
      No symbol table info available.
      #1  0x00007f9fa516f2af in __gthread_mutex_lock (__mutex=0x58) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
      No locals.
      #2  lock (this=0x58) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103
      No locals.
      #3  lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162
      No locals.
      #4  BackfillManager::wakeUpTask (this=0x0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/backfill-manager.cc:419
              lh = {_M_device = @0x58}
      #5  0x00007f9fa519a5e8 in DcpProducer::notifyBackfillManager (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/producer.cc:1399
      No locals.
      #6  0x00007f9fa517fc0b in DcpConnMap::notifyBackfillManagerTasks (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:465
              producer = <optimized out>
              handle = {<folly::LockedPtrBase<folly::Synchronized<ConnStore::CookieToConnMapHandle, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7f9f96462000}, static AllowsConcurrentAccess = false}
      #7  0x00007f9fa5265751 in PagingVisitor::complete (this=0x7f9f615345c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/paging_visitor.cc:328
              elapsed_time = <optimized out>
              inverse = false
      #8  0x00007f9fa5247d51 in VBCBAdaptor::run (this=0x7f9f95abb7f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kv_bucket.cc:2382
              id = 1024
      #9  0x00007f9fa522cfe3 in GlobalTask::execute (this=0x7f9f95abb7f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
              guard = {previous = 0x0}
      #10 0x00007f9fa5134b3f in CB3ExecutorThread::run (this=0x7f9f9f9f36c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc:174
              curTaskDescr = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                  _M_p = 0x7f9f968f7260 <Address 0x7f9f968f7260 out of bounds>}, _M_string_length = 30, {_M_local_buf = "\036\000\000\000\000\000\000\000pressor", _M_allocated_capacity = 30}}
              woketime = <optimized out>
              scheduleOverhead = <optimized out>
              again = <optimized out>
              runtime = <optimized out>
              q = <optimized out>
              tick = 101 'e'
              guard = {engine = 0x0}
      #11 0x00007f9fa3ba0777 in run (this=0x7f9f96ff9dd0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
      No locals.
      #12 platform_thread_wrap (arg=0x7f9f96ff9dd0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
              context = {_M_t = {
                  _M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7f9f96ff9dd0}, <No data fields>}, <No data fields>}}}
      #13 0x00007f9fa11ddea5 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #14 0x00007f9fa0f068dd in clone () from /lib64/libc.so.6
      No symbol table info available.
      (gdb) 
      

      cbcollect_info attached

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty