Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14825

Memcached crashes due to Segmentation fault, during swap rebalance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.1.2, 4.0.0
    • 4.0.0
    • couchbase-bucket
    • Security Level: Public
    • centOS 6.x
    • Untriaged
    • Unknown
    • Mar 9 - Mar 27

    Description

      Build


      4.0.0-2020

      Testcase
      --------
      ./testrunner -i -p items=500000,item_count_timeout=600 -t xdcr.rebalanceXDCR.Rebalance.swap_rebalance_out_master,rdirection=unidirection,ctopology=chain,update=C1,delete=C1,rebalance=C2-C1

      Steps


      1. C1 [.157,.158] --> C2 [.159,.160], default bucket.
      2. Load 500K keys onto default bucket
      3. Swap rebalance .164 with .157.
      Rebalance failed with stack trace:

      [2015-05-05 00:31:18,165] - [rest_client:1270] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
      [2015-05-05 00:31:19,008] - [rest_client:2184] INFO - Latest logs from UI on 172.23.121.157:
      [2015-05-05 00:31:19,008] - [rest_client:2185] ERROR - {u'node': u'ns_1@172.23.121.157', u'code': 2, u'text': u'Rebalance exited with reason {unexpected_exit,\n                              {\'EXIT\',<0.19639.43>,\n                               {bulk_set_vbucket_state_failed,\n                                [{\'ns_1@172.23.121.157\',\n                                  {\'EXIT\',\n                                   {{{{badmatch,{error,closed}},\n                                      [{mc_client_binary,cmd_vocal_recv,5,\n                                        [{file,"src/mc_client_binary.erl"},\n                                         {line,156}]},\n                                       {mc_client_binary,select_bucket,2,\n                                        [{file,"src/mc_client_binary.erl"},\n                                         {line,351}]},\n                                       {ns_memcached,ensure_bucket,2,\n                                        [{file,"src/ns_memcached.erl"},\n                                         {line,1291}]},\n                                       {ns_memcached,handle_info,2,\n                                        [{file,"src/ns_memcached.erl"},\n                                         {line,745}]},\n                                       {gen_server,handle_msg,5,\n                                        [{file,"gen_server.erl"},{line,604}]},\n                                       {ns_memcached,init,1,\n                                        [{file,"src/ns_memcached.erl"},\n                                         {line,174}]},\n                                       {gen_server,init_it,6,\n                                        [{file,"gen_server.erl"},{line,304}]},\n                                       {proc_lib,init_p_do_apply,3,\n                                        [{file,"proc_lib.erl"},{line,239}]}]},\n                                     {gen_server,call,\n                                      [\'ns_memcached-default\',\n                                       {delete_vbucket,271},\n                                       360000]}},\n                                    {gen_server,call,\n                                     [{\'janitor_agent-default\',\n                                       \'ns_1@172.23.121.157\'},\n                                      {if_rebalance,<0.1668.43>,\n                                       {update_vbucket_state,683,replica,\n                                        undefined,undefined}},\n                                      infinity]}}}}]}}}\n', u'shortText': u'message', u'serverTime': u'2015-05-05T00:31:05.755Z', u'module': u'ns_orchestrator', u'tstamp': 1430811065755, u'type': u'info'}
      [2015-05-05 00:31:19,009] - [rest_client:2185] ERROR - {u'node': u'ns_1@172.23.121.157', u'code': 0, u'text': u'<0.19579.43> exited with {unexpected_exit,\n                          {\'EXIT\',<0.19639.43>,\n                           {bulk_set_vbucket_state_failed,\n                            [{\'ns_1@172.23.121.157\',\n                              {\'EXIT\',\n                               {{{{badmatch,{error,closed}},\n                                  [{mc_client_binary,cmd_vocal_recv,5,\n                                    [{file,"src/mc_client_binary.erl"},\n                                     {line,156}]},\n                                   {mc_client_binary,select_bucket,2,\n                                    [{file,"src/mc_client_binary.erl"},\n                                     {line,351}]},\n                                   {ns_memcached,ensure_bucket,2,\n                                    [{file,"src/ns_memcached.erl"},\n                                     {line,1291}]},\n                                   {ns_memcached,handle_info,2,\n                                    [{file,"src/ns_memcached.erl"},\n                                     {line,745}]},\n                                   {gen_server,handle_msg,5,\n                                    [{file,"gen_server.erl"},{line,604}]},\n                                   {ns_memcached,init,1,\n                                    [{file,"src/ns_memcached.erl"},\n                                     {line,174}]},\n                                   {gen_server,init_it,6,\n                                    [{file,"gen_server.erl"},{line,304}]},\n                                   {proc_lib,init_p_do_apply,3,\n                                    [{file,"proc_lib.erl"},{line,239}]}]},\n                                 {gen_server,call,\n                                  [\'ns_memcached-default\',\n                                   {delete_vbucket,271},\n                                   360000]}},\n                                {gen_server,call,\n                                 [{\'janitor_agent-default\',\n                                   \'ns_1@172.23.121.157\'},\n                                  {if_rebalance,<0.1668.43>,\n                                   {update_vbucket_state,683,replica,\n                                    undefined,undefined}},\n                                  infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2015-05-05T00:31:05.745Z', u'module': u'ns_vbucket_mover', u'tstamp': 1430811065745, u'type': u'critical'}
       
      [2015-05-05 00:31:19,009] - [rest_client:2185] ERROR - {u'node': u'ns_1@172.23.121.157', u'code': 0, u'text': u"Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: 2015-05-05T00:31:04.340610-07:00 WARNING     /opt/couchbase/bin/memcached() [0x424cef]\n2015-05-05T00:31:04.340647-07:00 WARNING 
      

      Note: This cannot be consistently reproduced.

      Memcached core backtrace (from .157)
      -------------------------------------

      Program terminated with signal 11, Segmentation fault.
      #0  0x00007fb0c5c8c3a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
       
      Thread 1 (Thread 0x7fb0c05f0700 (LWP 3255)):
      #0  0x00007fb0c5c8c3a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #1  0x00007fb0c736b9e6 in cb_mutex_enter (mutex=Unhandled dwarf expression opcode 0xf3
      ) at /home/couchbase/jenkins/workspace/sherlock-unix/platform/src/cb_pthreads.c:85
      #2  0x00007fb0bc74232d in Mutex::acquire (this=0xa8) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/mutex.cc:31
       
      #3  0x00007fb0bc6d715a in lock (this=0x98, name="eq_dcpq:xdcr:dcp_ab17bf94ea68e369fe2ddf01c34dd61f/default/default_172.23.121.157:11210_1", items=std::vector of length 0, capacity 0) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/locks.h:70
      #4  LockHolder (this=0x98, name="eq_dcpq:xdcr:dcp_ab17bf94ea68e369fe2ddf01c34dd61f/default/default_172.23.121.157:11210_1", items=std::vector of length 0, capacity 0) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/locks.h:47
      #5  CheckpointManager::getAllItemsForCursor (this=0x98, name="eq_dcpq:xdcr:dcp_ab17bf94ea68e369fe2ddf01c34dd61f/default/default_172.23.121.157:11210_1", items=std::vector of length 0, capacity 0) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/checkpoint.cc:870
      #6  0x00007fb0bc6f540c in ActiveStream::nextCheckpointItem (this=0x7fb09bc77c80) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-stream.cc:469
      #7  0x00007fb0bc6f6b18 in inMemoryPhase (this=0x7fb09bc77c80) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-stream.cc:343
      #8  ActiveStream::next (this=0x7fb09bc77c80) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-stream.cc:148
      #9  0x00007fb0bc6f6acf in ActiveStream::next (this=0x7fb09bc77c80) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-stream.cc:167
      #10 0x00007fb0bc6eed78 in DcpProducer::getNextItem (this=0x7fb09f993e00) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-producer.cc:648
      #11 0x00007fb0bc6ef2ba in DcpProducer::step (this=0x7fb09f993e00, producers=0x653f60) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/dcp-producer.cc:245
      #12 0x00007fb0bc713dab in EvpDcpStep (handle=Unhandled dwarf expression opcode 0xf3
      ) at /home/couchbase/jenkins/workspace/sherlock-unix/ep-engine/src/ep_engine.cc:1458
      #13 0x00007fb0c05f4708 in dcp_step (handle=Unhandled dwarf expression opcode 0xf3
      ) at /home/couchbase/jenkins/workspace/sherlock-unix/memcached/engines/bucket_engine/bucket_engine.c:2230
      #14 0x000000000041fb1e in ship_dcp_log (c=0x7fb0a0da4500) at /home/couchbase/jenkins/workspace/sherlock-unix/memcached/daemon/memcached.c:2823
      #15 conn_ship_log (c=0x7fb0a0da4500) at /home/couchbase/jenkins/workspace/sherlock-unix/memcached/daemon/memcached.c:6776
      #16 0x000000000041050e in run_event_loop (c=0x7fb0a0da4500) at /home/couchbase/jenkins/workspace/sherlock-unix/memcached/daemon/connections.c:109
      #17 0x0000000000424cef in thread_libevent_process (fd=Unhandled dwarf expression opcode 0xf3
      ) at /home/couchbase/jenkins/workspace/sherlock-unix/memcached/daemon/thread.c:360
      #18 0x00007fb0c68fb488 in event_persist_closure (base=0x7fb0c3023280, flags=0) at /home/couchbase/jenkins/workspace/cbdeps-build/label/centos6/release/sherlock/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1319
      #19 event_process_active_single_queue (base=0x7fb0c3023280, flags=0) at /home/couchbase/jenkins/workspace/cbdeps-build/label/centos6/release/sherlock/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1363
      #20 event_process_active (base=0x7fb0c3023280, flags=0) at /home/couchbase/jenkins/workspace/cbdeps-build/label/centos6/release/sherlock/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1438
      #21 event_base_loop (base=0x7fb0c3023280, flags=0) at /home/couchbase/jenkins/workspace/cbdeps-build/label/centos6/release/sherlock/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639
      #22 0x00007fb0c736b8aa in platform_thread_wrap (arg=0x7fb0c32b2680) at /home/couchbase/jenkins/workspace/sherlock-unix/platform/src/cb_pthreads.c:19
      #23 0x00007fb0c5c8a9d1 in start_thread () from /lib64/libpthread.so.0
      #24 0x00007fb0c4e2b8fd in clone () from /lib64/libc.so.6
      

      Complete Backtrace --> https://friendpaste.com/7SVPS6pCfNoQd79ICe1Lo7

      Core file can be found @ 172.23.121.157/tmp/backup_crash/05_05_2015_00_48/core.memcached.3251

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty