Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28559

[Upgrade] Swap rebalance failed when upgrade from 4.6.1 to vulcan 2036

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 5.5.0
    • 5.5.0
    • couchbase-bucket
    • None
    • centos 6.x 64-bit
    • Untriaged
    • Unknown

    Description

      Swap rebalance to upgrade from 4.6.1 to vulcan 5.5.0-2036 failed due to "bulk_set_vbucket_state_failed"

      Create 3 nodes 4.6.1 cluster.
      Create a 5.5.0-2036 node.
      Add 5.5.0 node to 4.6.1 cluster and remove a 4.6.1 node out (swap rebalance)
      On the third swap, rebalance failed with error "bulk_set_vbucket_state_failed"
      but I also see memcached killed (exit status 134)
      The last rebalance details

      rebalance params :

      {'password': 'password', 'ejectedNodes': u'ns_1@172.23.105.110', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.121.181,ns_1@172.23.122.11,ns_1@172.23.106.126,ns_1@172.23.105.110'}

       {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.22984.4>,\n                                {bulk_set_vbucket_state_failed,\n                                 [{\'ns_1@172.23.106.126\',\n                                   {\'EXIT\',\n                                    {{{{{child_interrupted,\n                                         {\'EXIT\',<21048.19736.0>,\n                                          socket_closed}},\n                                        [{dcp_replicator,spawn_and_wait,1,\n                                          [{file,"src/dcp_replicator.erl"},\n                                           {line,233}]},\n                                         {dcp_replicator,handle_call,3,\n                                          [{file,"src/dcp_replicator.erl"},\n                                           {line,111}]},\n                                         {gen_server,handle_msg,5,\n                                          [{file,"gen_server.erl"},\n                                           {line,585}]},\n                                         {proc_lib,init_p_do_apply,3,\n                                          [{file,"proc_lib.erl"},\n                                           {line,239}]}]},\n                                       {gen_server,call,\n                                        [\'dcp_replicator-default-ns_1@172.23.122.11\',\n                                         {setup_replication,\n                                          [719,720,721,722,723,724,725,726,\n                                           727,728,729,730,731,732,733,734,\n                                           735,736,737,738,739,740,741,742,\n                                           743,744,745,746,747,748,749,750,\n                                           751,752,753,754,755,756,757,758,\n                                           759,760,761,762,763,764,765,766,\n                                           767,768,769,770,771,772,773,774,\n                                           775,776,777,778,779,780,781,782,\n                                           783,784,785,786,787,788,789,790,\n                                           791,792,793,794,795,796,797,798,\n                                           799,800,801,802,803,804,805,806,\n                                           807,808,809,810,811,812,813,814,\n                                           815,816,817,818,819,820,821,822,\n                                           823,824,825,826,827,828,829,830,\n                                           831,832,833,834,835,836,837,838,\n                                           839,840,841,842,843,844,845,846,\n                                           847,848,849,850,851,852,853]},\n                                         infinity]}},\n                                      {gen_server,call,\n                                       [\'replication_manager-default\',\n                                        {change_vbucket_replication,719,\n                                         \'ns_1@172.23.122.11\'},\n                                        infinity]}},\n                                     {gen_server,call,\n                                      [{\'janitor_agent-default\',\n                                        \'ns_1@172.23.106.126\'},\n                                       {if_rebalance,<0.27583.3>,\n                                        {update_vbucket_state,719,replica,\n                                         undefined,\'ns_1@172.23.122.11\'}},\n                                       infinity]}}}}]}}}}', u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.176Z', u'module': u'ns_orchestrator', u'tstamp': 1520125761176, u'type': u'critical'}
      [2018-03-03 17:09:24,827] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.106.126' disconnected: lost_connection", u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.149Z', u'module': u'ns_memcached', u'tstamp': 1520125761149, u'type': u'info'}
      [2018-03-03 17:09:24,828] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'<0.22967.4> exited with {unexpected_exit,\n                         {\'EXIT\',<0.22984.4>,\n                          {bulk_set_vbucket_state_failed,\n                           [{\'ns_1@172.23.106.126\',\n                             {\'EXIT\',\n                              {{{{{child_interrupted,\n                                   {\'EXIT\',<21048.19736.0>,socket_closed}},\n                                  [{dcp_replicator,spawn_and_wait,1,\n                                    [{file,"src/dcp_replicator.erl"},\n                                     {line,233}]},\n                                   {dcp_replicator,handle_call,3,\n                                    [{file,"src/dcp_replicator.erl"},\n                                     {line,111}]},\n                                   {gen_server,handle_msg,5,\n                                    [{file,"gen_server.erl"},{line,585}]},\n                                   {proc_lib,init_p_do_apply,3,\n                                    [{file,"proc_lib.erl"},{line,239}]}]},\n                                 {gen_server,call,\n                                  [\'dcp_replicator-default-ns_1@172.23.122.11\',\n                                   {setup_replication,\n                                    [719,720,721,722,723,724,725,726,727,728,\n                                     729,730,731,732,733,734,735,736,737,738,\n                                     739,740,741,742,743,744,745,746,747,748,\n                                     749,750,751,752,753,754,755,756,757,758,\n                                     759,760,761,762,763,764,765,766,767,768,\n                                     769,770,771,772,773,774,775,776,777,778,\n                                     779,780,781,782,783,784,785,786,787,788,\n                                     789,790,791,792,793,794,795,796,797,798,\n                                     799,800,801,802,803,804,805,806,807,808,\n                                     809,810,811,812,813,814,815,816,817,818,\n                                     819,820,821,822,823,824,825,826,827,828,\n                                     829,830,831,832,833,834,835,836,837,838,\n                                     839,840,841,842,843,844,845,846,847,848,\n                                     849,850,851,852,853]},\n                                   infinity]}},\n                                {gen_server,call,\n                                 [\'replication_manager-default\',\n                                  {change_vbucket_replication,719,\n                                   \'ns_1@172.23.122.11\'},\n                                  infinity]}},\n                               {gen_server,call,\n                                [{\'janitor_agent-default\',\n                                  \'ns_1@172.23.106.126\'},\n                                 {if_rebalance,<0.27583.3>,\n                                  {update_vbucket_state,719,replica,\n                                   undefined,\'ns_1@172.23.122.11\'}},\n                                 infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.145Z', u'module': u'ns_vbucket_mover', u'tstamp': 1520125761145, u'type': u'critical'}
      [2018-03-03 17:09:24,828] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2018-03-03T17:09:21.084094Z CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc199ab0000+0x8f221]\n2018-03-03T17:09:21.084109Z CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc199ab0000+0x8ff6f]\n2018-03-03T17:09:21.084115Z CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x5869e]\n2018-03-03T17:09:21.084124Z CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x38e91]\n2018-03-03T17:09:21.084131Z CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8() [0x7fc19a8f8000+0x1937c]\n2018-03-03T17:09:21.084135Z CRITICAL     /opt/couchbase/bin/../lib/libevent_core.so.2.1.8(event_base_loop+0x46f) [0x7fc19a8f8000+0x1c7cf]\n2018-03-03T17:09:21.084143Z CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x374f4]\n2018-03-03T17:09:21.084148Z CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fc19b7f0000+0x8387]\n2018-03-03T17:09:21.084153Z CRITICAL     /lib64/libpthread.so.0() [0x7fc19b1af000+0x7aa1]\n2018-03-03T17:09:21.084188Z CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fc199282000+0xe8bcd]", u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.138Z', u'module': u'ns_log', u'tstamp': 1520125761138, u'type': u'info'}
      [2018-03-03 17:09:24,829] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Bucket "default" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:56.325Z', u'module': u'ns_vbucket_mover', u'tstamp': 1520125736325, u'type': u'info'}
      [2018-03-03 17:09:24,829] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.106.126\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:54.939Z', u'module': u'ns_memcached', u'tstamp': 1520125734939, u'type': u'info'}
      [2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 1, u'text': u'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.106.126\'. Version: "5.5.0-2036-enterprise".', u'shortText': u'web start ok', u'serverTime': u'2018-03-03T17:08:54.675Z', u'module': u'menelaus_sup', u'tstamp': 1520125734675, u'type': u'info'}
      [2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u'Shutting down bucket "default" on \'ns_1@172.23.106.126\' for server shutdown', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:54.317Z', u'module': u'ns_memcached', u'tstamp': 1520125734317, u'type': u'info'}
      [2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:53.688Z', u'module': u'ns_rebalancer', u'tstamp': 1520125733688, u'type': u'info'}
      [2018-03-03 17:09:24,831] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.121.181','ns_1@172.23.122.11',\n                                 'ns_1@172.23.106.126'], EjectNodes = ['ns_1@172.23.105.110'], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:53.450Z', u'module': u'ns_orchestrator', u'tstamp': 1520125733450, u'type': u'info'}
      [('/usr/lib64/python2.7/threading.py', 785, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib64/python2.7/threading.py', 812, [('/usr/lib64/python2.7/threading.py', 785, '__bootstrap''__bootstrap_inner', 'self.run()'), , ('self'lib/tasks/taskmanager.py'., __bootst31, 'run', 'task.step(self)'), ('lib/tasks/task.pyr'ap_inner()', ), ('/usr/lib64/77p, y'tstep', 'selfh.ocheck(task_manang2.7/threading.pye'r, 812, '__bootstrap_inner', 'self.run()'), ('/usr/lib64/python2.7/threading.py', 765, 'run', 'self.__target(*self.__args, **self.__kwargs)'), ('pytests/upgrade/upgrade_tests.py', 663, 'online_upgrade', 'self.online_upgrade_swap_rebalance()'), ('pytests/)u')p, grade/upgrade_tests.py'(, '709, l'ib/tasks/task.opny', 508l, i'nceheck', 'se_lf.set_exceptupigon(ex)')r, a('lib/taskdse_swap_rebalance/'f, 'servers_out.values())'), ('lib/couchbase_helper/cluster.py', 327, 'rebalance', 'return _task.result(timeout)'), ('lib/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py'uture.py', 111, , '__get_result', 264, 'set_exception', 'print traceback.extract_stack()')]'print traceback.extract_stack()'
      Sat Mar  3 17:09:24 2018
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              thuan Thuan Nguyen
              thuan Thuan Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty