Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47486

Rebalance failed due to memcached crash during EPBucket::flushVBucket on one of the nodes.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • 6.6.3
    • couchbase-bucket
    • 6.6.3-9796
    • Untriaged
    • 1
    • Unknown

    Description

      QE test

      ./testrunner -i /tmp/testexec.1950.ini -p get-cbcollect-info=False,GROUP=P0,get-cbcollect-info=True,bucket_storage=couchstore -t rebalance.rebalance_start_stop.RebalanceStartStopTests.test_start_stop_rebalance_with_mutations,nodes_init=1,nodes_in=2,nodes_out=0,extra_nodes_in=1,extra_nodes_out=0,items=100000,max_verify=10000,value_size=1024,GROUP=IN_OUT;P0
      

      Node .10:

      {u'node': u'ns_1@172.23.104.80', u'code': 0, u'text': u"Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {'EXIT',<0.3348.11>,\n                                {{{{socket_closed,\n                                    {gen_server,call,\n                                     [<0.20554.10>,get_partitions,infinity]}},\n                                   {gen_server,call,\n                                    ['dcp_replication_manager-default',\n                                     get_actual_replications,infinity]}},\n                                  {gen_server,call,\n                                   ['replication_manager-default',\n                                    {change_vbucket_replication,707,undefined},\n                                    infinity]}},\n                                 {gen_server,call,\n                                  [{'janitor_agent-default',\n                                    'ns_1@172.23.104.80'},\n                                   {if_rebalance,<0.19504.10>,\n                                    {update_vbucket_state,706,active,paused,\n                                     undefined,\n                                     [['ns_1@172.23.104.80',undefined],\n                                      ['ns_1@172.23.106.47',\n                                       'ns_1@172.23.104.80']]}},\n                                   infinity]}}}}}.\nRebalance Operation Id = bf68a92eea98016ed437f78414325b45", u'shortText': u'message', u'serverTime': u'2021-07-19T00:26:04.369Z', u'module': u'ns_orchestrator', u'tstamp': 1626679564369, u'type': u'critical'}
      [2021-07-19 00:26:08,241] - [rest_client:3471] ERROR - {u'node': u'ns_1@172.23.106.10', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.106.10' disconnected. Check logs for details.", u'shortText': u'message', u'serverTime': u'2021-07-19T00:26:04.359Z', u'module': u'ns_memcached', u'tstamp': 1626679564359, u'type': u'info'}
      [2021-07-19 00:26:08,241] - [rest_client:3471] ERROR - {u'node': u'ns_1@172.23.106.10', u'code': 0, u'text': u"Service 'memcached' exited with status 139. Restarting. Messages:\n2021-07-19T00:26:04.334743-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x1505af]\n2021-07-19T00:26:04.334754-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x150749]\n2021-07-19T00:26:04.334765-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x14c961]\n2021-07-19T00:26:04.334775-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0xe05bd]\n2021-07-19T00:26:04.334787-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x133f7c]\n2021-07-19T00:26:04.334798-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x134eb9]\n2021-07-19T00:26:04.334809-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f278ba9b000+0x12e854]\n2021-07-19T00:26:04.334815-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2793085000+0x8f17]\n2021-07-19T00:26:04.334824-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f279089a000+0x7dc5]\n2021-07-19T00:26:04.334873-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f27904d9000+0xf621d]", u'shortText': u'message', u'serverTime': u'2021-07-19T00:26:04.356Z', u'module': u'ns_log', u'tstamp': 1626679564356, u'type': u'info'}
      

      Stack trace of the crash on node .10:

      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
      #0  0x00007f2790636f45 in __memcmp_sse4_1 () from /usr/lib64/libc-2.17.so
      #0  0x00007f2790636f45 in __memcmp_sse4_1 () from /usr/lib64/libc-2.17.so
      #1  0x00007f278bbde846 in compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>) at /usr/local/include/c++/7.3.0/bits/char_traits.h:310
      #2  compare (__str=..., this=0x7f2755e24580) at /usr/local/include/c++/7.3.0/bits/basic_string.h:2830
      #3  compare (rhs=..., this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/storeddockey.h:121
      #4  OrderItemsForDeDuplication::operator() (this=this@entry=0x7f275cff6ea0, i1=..., i2=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/item.cc:584
      #5  0x00007f278bbeb5af in operator()<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, __gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > > > (__it=..., __val=..., this=0x7f275cff6ea0) at /usr/local/include/c++/7.3.0/bits/predefined_ops.h:215
      #6  std::__unguarded_linear_insert<__gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, std::allocator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > > >, __gnu_cxx::__ops::_Val_comp_iter<OrderItemsForDeDuplication> > (__last=..., __last@entry=..., __comp=...) at /usr/local/include/c++/7.3.0/bits/stl_algo.h:1828
      #7  0x00007f278bbeb749 in std::__insertion_sort<__gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, std::allocator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > > >, __gnu_cxx::__ops::_Iter_comp_iter<OrderItemsForDeDuplication> > (__first=__first@entry=..., __last=..., __last@entry=..., __comp=...) at /usr/local/include/c++/7.3.0/bits/stl_algo.h:1855
      #8  0x00007f278bbe7961 in __final_insertion_sort<__gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > >, __gnu_cxx::__ops::_Iter_comp_iter<OrderItemsForDeDuplication> > (__comp=..., __last=..., __first=...) at /usr/local/include/c++/7.3.0/bits/stl_algo.h:1890
      #9  __sort<__gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > >, __gnu_cxx::__ops::_Iter_comp_iter<OrderItemsForDeDuplication> > (__comp=..., __last=..., __first=...) at /usr/local/include/c++/7.3.0/bits/stl_algo.h:1971
      #10 sort<__gnu_cxx::__normal_iterator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >*, std::vector<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > > >, OrderItemsForDeDuplication> (__last=..., __first=..., __comp=...) at /usr/local/include/c++/7.3.0/bits/stl_algo.h:4868
      #11 KVStore::optimizeWrites (this=this@entry=0x7f2785734e00, items=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvstore.cc:591
      #12 0x00007f278bb7b5bd in EPBucket::flushVBucket (this=0x7f278ef7c000, vbid=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:385
      #13 0x00007f278bbcef7c in Flusher::flushVB (this=this@entry=0x7f27858e61c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:306
      #14 0x00007f278bbcfeb9 in Flusher::step (this=0x7f27858e61c0, task=0x7f278f174b30) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:207
      #15 0x00007f278bbc9854 in ExecutorThread::run (this=0x7f278f1807a0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:190
      #16 0x00007f279308df17 in run (this=0x7f278f06dab0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
      #17 platform_thread_wrap (arg=0x7f278f06dab0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
      #18 0x00007f27908a1dc5 in start_thread () from /usr/lib64/libpthread-2.17.so
      #19 0x00007f27905cf21d in getspnam () from /usr/lib64/libc-2.17.so
      #20 0x0000000000000000 in ?? ()
      

      Jenkins Job(test_4): http://qa.sc.couchbase.com/job/test_suite_executor/369490/consoleFull

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty