Details
-
Bug
-
Resolution: Duplicate
-
Major
-
Cheshire-Cat
-
7.0.0-4390
-
Untriaged
-
Centos 64-bit
-
-
1
-
Yes
Description
Build: 7.0.0-4390
Scenario:
- 5 node cluster
- Ephemeral bucket with 0 replicas and initial load for 50K docs
- Update replicas from 0 -> 3 and again from 3 -> 0 in sequential way
Observation:
During replica update of 2 -> 3, rebalance failed due to memcached crash
{'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none'} - rebalance failed
|
Latest logs from UI on 172.23.122.0:
|
{'code': 0, 'module': 'ns_memcached', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861326343L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:06.343Z', 'text': 'Bucket "default" loaded on node 'ns_1@172.23.122.0' in 0 seconds.'}
|
{'code': 0, 'module': 'ns_orchestrator', 'type': 'critical', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325543L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.543Z',
|
'text': 'Rebalance exited with reason
|
{mover_crashed, {unexpected_exit, {'EXIT',<0.26741.5>,
|
{{{{{child_interrupted,{'EXIT',<21750.15290.0>,socket_closed}},
|
[{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,265}]},
|
{dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]},
|
{gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,661}]},
|
{gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,690}]},
|
{proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]},
|
{gen_server,call, [<21750.15288.0>, {setup_replication, [103,104,105,106,107,108,109,110,111,112,113,114,115,143,144,145,146,147,149,150,151,152,153,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,409,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,564,918,919,920,972,989,990,991,992,993,997,998,999,1000,1001,1002,1003,1004,1005,1012]},infinity]}},{gen_server,call,['replication_manager-default',{change_vbucket_replication,1012,'ns_1@172.23.122.0'},infinity]}},{gen_server,call,[{'janitor_agent-default','ns_1@172.23.121.93'},{if_rebalance,<0.22841.5>,{update_vbucket_state,478,active,paused,undefined,[['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0','ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89']]}},infinity]}}}}}.
|
Rebalance Operation Id = 516c27e651d6752ba09189b3a77c09d5'}
|
{'code': 0, 'module': 'ns_vbucket_mover', 'type': 'critical', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325327L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.327Z',
|
'text': 'Worker <0.26180.5>
|
(for action {move,{478,['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0','ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89'],[]}})
|
exited with reason
|
{unexpected_exit,
|
{'EXIT',<0.26741.5>,
|
{{{{{child_interrupted,{'EXIT',<21750.15290.0>,socket_closed}},
|
[{dcp_replicator,spawn_and_wait,1,[{file,"src/dcp_replicator.erl"},{line,265}]},
|
{dcp_replicator,handle_call,3,[{file,"src/dcp_replicator.erl"},{line,121}]},
|
{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},
|
{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},
|
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}
|
]},
|
{gen_server,call,[<21750.15288.0>,{setup_replication,[103,104,105,106,107,108,109,110,111,112,113,114,115,143,144,145,146,147,149,150,151,152,153,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,409,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,564,918,919,920,972,989,990,991,992,993,997,998,999,1000,1001,1002,1003,1004,1005,1012]},infinity]}},
|
{gen_server,call,['replication_manager-default',{change_vbucket_replication,1012,'ns_1@172.23.122.0'},infinity]}},
|
{gen_server,call,[{'janitor_agent-default','ns_1@172.23.121.93'},{if_rebalance,<0.22841.5>,{update_vbucket_state,478,active,paused,undefined,[['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0',ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89']]}},infinity]
|
}}}}'}
|
{'code': 0, 'module': 'ns_log', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325310L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.310Z', 'text': u"Service 'memcached' exited with status 134.
|
Restarting. Messages:
|
2021-02-09T01:02:05.244758-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f8d7d09a000+0x8efd1]
|
2021-02-09T01:02:05.244770-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x16ee03]
|
2021-02-09T01:02:05.244781-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x168e62]
|
2021-02-09T01:02:05.244793-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2e97f6]
|
2021-02-09T01:02:05.244821-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2d1cea]
|
2021-02-09T01:02:05.244831-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2ec919]
|
2021-02-09T01:02:05.244840-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x1672a3]
|
2021-02-09T01:02:05.244867-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f8d7d09a000+0xb9dcf]
|
2021-02-09T01:02:05.244879-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f8d7c965000+0x7e65]
|
2021-02-09T01:02:05.244909-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f8d7c597000+0xfe88d]"}
|
{'code': 0, 'module': 'ns_memcached', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325299L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.299Z', 'text': u"Control connection to memcached on 'ns_1@172.23.122.0' disconnected. Check logs for details."}
|
{'code': 0, 'module': 'ns_vbucket_mover', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861322977L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:02.977Z', 'text': 'Bucket "default" rebalance does not seem to be swap rebalance'}
|
{'code': 0, 'module': 'ns_rebalancer', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861322331L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:02.331Z', 'text': 'Started rebalancing bucket default'}
|
{'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861321935L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:01.935Z', 'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.121.94','ns_1@172.23.121.89', 'ns_1@172.23.121.93','ns_1@172.23.122.157','ns_1@172.23.122.0'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 516c27e651d6752ba09189b3a77c09d5"}
|
{'code': 0, 'module': 'menelaus_web_buckets', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861308637L, 'shortText': 'message', 'serverTime': '2021-02-09T01:01:48.637Z', 'text': 'Updated bucket "default" (of type ephemeral) properties:[{num_replicas,3},{ram_quota,2660237312},{storage_mode,ephemeral}]'}
|
{'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861162239L, 'shortText': 'message', 'serverTime': '2021-02-09T00:59:22.239Z', 'text': 'Rebalance completed successfully.Rebalance Operation Id = 6c277267a1792518345685f9cf925dc6'}
|
Rebalance Failed: {'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none'} - rebalance failed
|
Attached Memcached back trace with the ticket
TAF test:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.21850.ini bucket_type=ephemeral,num_items=50000,durability=MAJORITY,sdk_timeout=60,num_items=500000,GROUP=P0,upgrade_version=7.0.0-4390 -t bucket_param_tests.bucket_param_update.BucketParamTest.test_replica_update,bucket_type=ephemeral,upgrade_version=7.0.0-4390,rerun=True,GROUP=P0,doc_ops=update;create;delete,sdk_timeout=60,get-cbcollect-info=False,replicas=0,durability=MAJORITY,log_level=debug,nodes_init=5,num_items=500000,infra_log_level=critical'
|
Attachments
Issue Links
- duplicates
-
MB-44079 Ephemeral out of order purging can cause prepares to be recommitted and DurabilityMonitor montonicity exceptions to throw
- Closed