Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44172

CouchStore: Swap rebalance failed due to mover crashed during dcp_takeover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Test Blocker
    • None
    • Cheshire-Cat
    • couchbase-bucket
    • 7.0.0-4401

    Description

      Steps:
      Step 1: Create a 15 node cluster
      Step 2: Create required buckets and collections.
      Step 3: Create 100000000 items sequentially
      Step 4.1: Rebalance IN with Loading of docs
      Step 5.1: Rebalance OUT with Loading of docs
      Step 6.1: Rebalance SWAP with Loading of docs
      Step 7.1: Rebalance IN/OUT with Loading of docs
      Step 8.1: Rebalance OUT/IN with Loading of docs
      Step 4.2: Rebalance IN with Loading of docs
      Step 5.2: Rebalance OUT with Loading of docs
      Step 6.2: Rebalance SWAP with Loading of docs
      Step 7.2: Rebalance IN/OUT with Loading of docs
      Step 8.2: Rebalance OUT/IN with Loading of docs
      Step 4.3: Rebalance IN with Loading of docs
      Step 5.3: Rebalance OUT with Loading of docs
      Step 6.3: Rebalance SWAP with Loading of docs
      Step 7.3: Rebalance IN/OUT with Loading of docs
      Step 8.3: Rebalance OUT/IN with Loading of docs
      Step 4.4: Rebalance IN with Loading of docs
      Step 5.4: Rebalance OUT with Loading of docs
      Step 6.4: Rebalance SWAP with Loading of docs. Rebalance Failed

      {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
      2021-02-08 01:05:39,464 | test  | INFO    | pool-1-thread-22 | [rest_client:print_UI_logs:2579] Latest logs from UI on 172.23.120.170:
      2021-02-08 01:05:39,464 | test  | ERROR   | pool-1-thread-22 | [rest_client:print_UI_logs:2581] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.120.170', u'tstamp': 1612775137791L, u'shortText': u'message', u'serverTime': u'2021-02-08T01:05:37.791Z', u'text': u"Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {'EXIT',<0.26499.54>,\n                                {noproc,\n                                 {gen_server,call,\n                                  [{'janitor_agent-GleamBook',\n                                    'ns_1@172.23.121.123'},\n                                   {if_rebalance,<0.13406.52>,\n                                    {dcp_takeover,'ns_1@172.23.121.127',409}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 2ec6011f31f2673e566036758ef5a8f1"}
      2021-02-08 01:05:39,466 | test  | ERROR   | pool-1-thread-22 | [rest_client:print_UI_logs:2581] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.120.170', u'tstamp': 1612775137772L, u'shortText': u'message', u'serverTime': u'2021-02-08T01:05:37.772Z', u'text': u"Worker <0.22680.54> (for action {move,{409,\n                                       ['ns_1@172.23.121.127',\n                                        'ns_1@172.23.121.131'],\n                                       ['ns_1@172.23.121.123',\n                                        'ns_1@172.23.121.131'],\n                                       []}}) exited with reason {unexpected_exit,\n                                                                 {'EXIT',\n                                                                  <0.26499.54>,\n                                                                  {noproc,\n                                                                   {gen_server,\n                                                                    call,\n                                                                    [{'janitor_agent-GleamBook',\n                                                                      'ns_1@172.23.121.123'},\n                                                                     {if_rebalance,\n                                                                      <0.13406.52>,\n                                                                      {dcp_takeover,\n                                                                       'ns_1@172.23.121.127',\n                                                                       409}},\n                                                                     infinity]}}}}"}
      

      172.23.120.170: Stack Trace of first crash - ce04acfa-dcba-4450-f36a8b96-0e910304.dmp

      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007f6ddcb8c387 in raise () from /lib64/libc.so.6
       #0  0x00007f6ddcb8c387 in raise () from /lib64/libc.so.6
       No symbol table info available.
       #1  0x00007f6ddcb8da78 in abort () from /lib64/libc.so.6
       No symbol table info available.
       #2  0x00007f6ddd6ea195 in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc:95
               terminating = false
               t = <optimized out>
       #3  0x0000000000555bf2 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:86
       No locals.
       #4  0x00007f6ddd6e7f86 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
       No locals.
       #5  0x00007f6ddd6e7fd1 in std::terminate () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:57
       No locals.
       #6  0x00007f6de10f91d3 in GlobalTask::execute (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:78
       No locals.
       #7  0x00007f6de10f2d02 in operator() (__closure=0x7f6d6cfe0040) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/folly_executorpool.cc:195
               executedAt = <optimized out>
               end = <optimized out>
               scheduleOverhead = <optimized out>
               start = {__d = {__r = 953509117181810}}
               runAgain = false
               proxy = @0x7f6dd37d3190: <error reading variable>
       #8  folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:387
               fn = @0x7f6d6cfe0040: {__proxy = @0x7f6dd37d3190}
       #9  0x00007f6de1273e76 in operator() (this=0x7f6d6cfe0040) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
               fn = @0x7f6d6cfe0040: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f6dd37d3190, tiny = {__data = "\220\061}\323m\177\000\000\000\000\000\000\000\000\000\000P\374>lm\177\000\000\200/\000\214m\177\000\000\001\000\000\000\000\000\000\000\220\000\376lm\177\000", __align = {<No data fields>}}}, call_ = 0x7f6de10f2c40 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#1}>(folly::detail::function::Data&)>, exec_ = 0x7f6de10f0650 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#1}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}
       #10 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=0x7f6dd37f2c00, thread=..., task=<unknown type in /usr/lib/debug/opt/couchbase/lib/libep.so.debug, CU 0x30bcc0a, DIE 0x3103026>) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:98
               rctx = {prev_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}}
               startTime = {__d = {__r = 953509117177063}}
               stats = {expired = false, waitTime = {__r = 26880}, runTime = {__r = 0}, enqueueTime = {__d = {__r = 953509117150183}}, requestId = 0}
       #11 0x00007f6de125c36a in folly::CPUThreadPoolExecutor::threadRun (this=0x7f6dd37f2c00, thread=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
               guard = {list_ = {forbid = true, prev = 0x0, curr = {name = {static npos = <optimized out>, b_ = 0x7f6de12dc05b "CPUThreadPoolExecutor", e_ = 0x7f6de12dc070 ""}}}}
       #12 0x00007f6de1276f99 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
       No locals.
       #13 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:95
       No locals.
       #14 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
       No locals.
       #15 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
       No locals.
       #16 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
               fn = <optimized out>
       #17 0x00007f6de10f1488 in operator() (this=0x7f6dd37abcc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:416
               fn = @0x7f6dd37abcc0: <error reading variable>
       #18 operator() (__closure=0x7f6dd37abcc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/folly_executorpool.cc:54
               threadNameOpt = {storage_ = {{emptyState = -112 '\220', value = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7f6d6cfe0190 "WriterPool3"}, _M_string_length = 11, {_M_local_buf = "WriterPool3\000\000\000\000", _M_allocated_capacity = 8021036716417184343}}}, hasValue = true}}
               func = <error reading variable func (Cannot access memory at address 0x7f6dd37abcc0)>
       #19 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
               fn = @0x7f6dd37abcc0: <error reading variable>
       #20 0x00007f6de10f1363 in operator() (this=0x7f6dd3595ee0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:416
               fn = @0x7f6dd3595ee0: <error reading variable>
       #21 operator() (__closure=0x7f6dd3595ed0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/executors/thread_factory/PriorityThreadFactory.h:54
               func = <error reading variable func (Cannot access memory at address 0x7f6dd3595ee0)>
               priority = <error reading variable priority (Cannot access memory at address 0x7f6dd3595ed0)>
       #22 folly::detail::function::FunctionTraits<void ()>::callBig<folly::PriorityThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
               fn = @0x7f6dd3595ed0: <error reading variable>
       #23 0x00007f6ddd712dcf in std::execute_native_thread_routine (__p=0x7f6dd37cfe60) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83
               __t = {_M_t = {_M_t = {<std::_Tuple_impl<0, std::thread::_State*, std::default_delete<std::thread::_State> >> = {<std::_Tuple_impl<1, std::default_delete<std::thread::_State> >> = {<std::_Head_base<1, std::default_delete<std::thread::_State>, true>> = {<std::default_delete<std::thread::_State>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, std::thread::_State*, false>> = {_M_head_impl = 0x7f6dd37cfe60}, <No data fields>}, <No data fields>}}}
       #24 0x00007f6ddcf2bea5 in start_thread () from /lib64/libpthread.so.0
       No symbol table info available.
       #25 0x00007f6ddcc548dd in clone () from /lib64/libc.so.6
       No symbol table info available.
      

      Test:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/test_job_magma.ini -p bucket_storage=couchstore,bucket_eviction_policy=fullEviction,rerun=False -t volumetests.Magma.volume.test_long_rebalance,nodes_init=15,replicas=1,skip_cleanup=True,num_items=100000000,num_buckets=1,bucket_names=GleamBook,doc_size=256,bucket_type=membase,compression_mode=off,iterations=10,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=info,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,maxttl=300,num_collections=50,doc_ops=expiry,durability=MAJORITY,pc=1,sdk_client_pool=True -m rest'
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty