Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
5.5.0
-
None
-
centos 6.x 64-bit
-
Untriaged
-
Unknown
Description
Swap rebalance to upgrade from 4.6.1 to vulcan 5.5.0-2036 failed due to "bulk_set_vbucket_state_failed"
Create 3 nodes 4.6.1 cluster.
Create a 5.5.0-2036 node.
Add 5.5.0 node to 4.6.1 cluster and remove a 4.6.1 node out (swap rebalance)
On the third swap, rebalance failed with error "bulk_set_vbucket_state_failed"
but I also see memcached killed (exit status 134)
The last rebalance details
rebalance params :
{'password': 'password', 'ejectedNodes': u'ns_1@172.23.105.110', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.121.181,ns_1@172.23.122.11,ns_1@172.23.106.126,ns_1@172.23.105.110'}
{u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.22984.4>,\n {bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.106.126\',\n {\'EXIT\',\n {{{{{child_interrupted,\n {\'EXIT\',<21048.19736.0>,\n socket_closed}},\n [{dcp_replicator,spawn_and_wait,1,\n [{file,"src/dcp_replicator.erl"},\n {line,233}]},\n {dcp_replicator,handle_call,3,\n [{file,"src/dcp_replicator.erl"},\n {line,111}]},\n {gen_server,handle_msg,5,\n [{file,"gen_server.erl"},\n {line,585}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},\n {line,239}]}]},\n {gen_server,call,\n [\'dcp_replicator-default-ns_1@172.23.122.11\',\n {setup_replication,\n [719,720,721,722,723,724,725,726,\n 727,728,729,730,731,732,733,734,\n 735,736,737,738,739,740,741,742,\n 743,744,745,746,747,748,749,750,\n 751,752,753,754,755,756,757,758,\n 759,760,761,762,763,764,765,766,\n 767,768,769,770,771,772,773,774,\n 775,776,777,778,779,780,781,782,\n 783,784,785,786,787,788,789,790,\n 791,792,793,794,795,796,797,798,\n 799,800,801,802,803,804,805,806,\n 807,808,809,810,811,812,813,814,\n 815,816,817,818,819,820,821,822,\n 823,824,825,826,827,828,829,830,\n 831,832,833,834,835,836,837,838,\n 839,840,841,842,843,844,845,846,\n 847,848,849,850,851,852,853]},\n infinity]}},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,719,\n \'ns_1@172.23.122.11\'},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.106.126\'},\n {if_rebalance,<0.27583.3>,\n {update_vbucket_state,719,replica,\n undefined,\'ns_1@172.23.122.11\'}},\n infinity]}}}}]}}}}', u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.176Z', u'module': u'ns_orchestrator', u'tstamp': 1520125761176, u'type': u'critical'}
|
[2018-03-03 17:09:24,827] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.106.126' disconnected: lost_connection", u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.149Z', u'module': u'ns_memcached', u'tstamp': 1520125761149, u'type': u'info'}
|
[2018-03-03 17:09:24,828] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'<0.22967.4> exited with {unexpected_exit,\n {\'EXIT\',<0.22984.4>,\n {bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.106.126\',\n {\'EXIT\',\n {{{{{child_interrupted,\n {\'EXIT\',<21048.19736.0>,socket_closed}},\n [{dcp_replicator,spawn_and_wait,1,\n [{file,"src/dcp_replicator.erl"},\n {line,233}]},\n {dcp_replicator,handle_call,3,\n [{file,"src/dcp_replicator.erl"},\n {line,111}]},\n {gen_server,handle_msg,5,\n [{file,"gen_server.erl"},{line,585}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,239}]}]},\n {gen_server,call,\n [\'dcp_replicator-default-ns_1@172.23.122.11\',\n {setup_replication,\n [719,720,721,722,723,724,725,726,727,728,\n 729,730,731,732,733,734,735,736,737,738,\n 739,740,741,742,743,744,745,746,747,748,\n 749,750,751,752,753,754,755,756,757,758,\n 759,760,761,762,763,764,765,766,767,768,\n 769,770,771,772,773,774,775,776,777,778,\n 779,780,781,782,783,784,785,786,787,788,\n 789,790,791,792,793,794,795,796,797,798,\n 799,800,801,802,803,804,805,806,807,808,\n 809,810,811,812,813,814,815,816,817,818,\n 819,820,821,822,823,824,825,826,827,828,\n 829,830,831,832,833,834,835,836,837,838,\n 839,840,841,842,843,844,845,846,847,848,\n 849,850,851,852,853]},\n infinity]}},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,719,\n \'ns_1@172.23.122.11\'},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.106.126\'},\n {if_rebalance,<0.27583.3>,\n {update_vbucket_state,719,replica,\n undefined,\'ns_1@172.23.122.11\'}},\n infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.145Z', u'module': u'ns_vbucket_mover', u'tstamp': 1520125761145, u'type': u'critical'}
|
[2018-03-03 17:09:24,828] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2018-03-03T17:09:21.084094Z CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc199ab0000+0x8f221]\n2018-03-03T17:09:21.084109Z CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc199ab0000+0x8ff6f]\n2018-03-03T17:09:21.084115Z CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x5869e]\n2018-03-03T17:09:21.084124Z CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x38e91]\n2018-03-03T17:09:21.084131Z CRITICAL /opt/couchbase/bin/../lib/libevent_core.so.2.1.8() [0x7fc19a8f8000+0x1937c]\n2018-03-03T17:09:21.084135Z CRITICAL /opt/couchbase/bin/../lib/libevent_core.so.2.1.8(event_base_loop+0x46f) [0x7fc19a8f8000+0x1c7cf]\n2018-03-03T17:09:21.084143Z CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x374f4]\n2018-03-03T17:09:21.084148Z CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fc19b7f0000+0x8387]\n2018-03-03T17:09:21.084153Z CRITICAL /lib64/libpthread.so.0() [0x7fc19b1af000+0x7aa1]\n2018-03-03T17:09:21.084188Z CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7fc199282000+0xe8bcd]", u'shortText': u'message', u'serverTime': u'2018-03-03T17:09:21.138Z', u'module': u'ns_log', u'tstamp': 1520125761138, u'type': u'info'}
|
[2018-03-03 17:09:24,829] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Bucket "default" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:56.325Z', u'module': u'ns_vbucket_mover', u'tstamp': 1520125736325, u'type': u'info'}
|
[2018-03-03 17:09:24,829] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.106.126\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:54.939Z', u'module': u'ns_memcached', u'tstamp': 1520125734939, u'type': u'info'}
|
[2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 1, u'text': u'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.106.126\'. Version: "5.5.0-2036-enterprise".', u'shortText': u'web start ok', u'serverTime': u'2018-03-03T17:08:54.675Z', u'module': u'menelaus_sup', u'tstamp': 1520125734675, u'type': u'info'}
|
[2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.106.126', u'code': 0, u'text': u'Shutting down bucket "default" on \'ns_1@172.23.106.126\' for server shutdown', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:54.317Z', u'module': u'ns_memcached', u'tstamp': 1520125734317, u'type': u'info'}
|
[2018-03-03 17:09:24,830] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:53.688Z', u'module': u'ns_rebalancer', u'tstamp': 1520125733688, u'type': u'info'}
|
[2018-03-03 17:09:24,831] - [rest_client:3082] ERROR - {u'node': u'ns_1@172.23.122.11', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.121.181','ns_1@172.23.122.11',\n 'ns_1@172.23.106.126'], EjectNodes = ['ns_1@172.23.105.110'], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2018-03-03T17:08:53.450Z', u'module': u'ns_orchestrator', u'tstamp': 1520125733450, u'type': u'info'}
|
[('/usr/lib64/python2.7/threading.py', 785, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib64/python2.7/threading.py', 812, [('/usr/lib64/python2.7/threading.py', 785, '__bootstrap''__bootstrap_inner', 'self.run()'), , ('self'lib/tasks/taskmanager.py'., __bootst31, 'run', 'task.step(self)'), ('lib/tasks/task.pyr'ap_inner()', ), ('/usr/lib64/77p, y'tstep', 'selfh.ocheck(task_manang2.7/threading.pye'r, 812, '__bootstrap_inner', 'self.run()'), ('/usr/lib64/python2.7/threading.py', 765, 'run', 'self.__target(*self.__args, **self.__kwargs)'), ('pytests/upgrade/upgrade_tests.py', 663, 'online_upgrade', 'self.online_upgrade_swap_rebalance()'), ('pytests/)u')p, grade/upgrade_tests.py'(, '709, l'ib/tasks/task.opny', 508l, i'nceheck', 'se_lf.set_exceptupigon(ex)')r, a('lib/taskdse_swap_rebalance/'f, 'servers_out.values())'), ('lib/couchbase_helper/cluster.py', 327, 'rebalance', 'return _task.result(timeout)'), ('lib/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py'uture.py', 111, , '__get_result', 264, 'set_exception', 'print traceback.extract_stack()')]'print traceback.extract_stack()'
|
Sat Mar 3 17:09:24 2018
|
Attachments
Issue Links
- duplicates
-
MB-28453 memcached exits with status 134 and rebalance failures in centos longevity
- Closed