Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Enterprise Edition 7.1.0 build 1416
-
Untriaged
-
Centos 64-bit
-
-
1
-
Unknown
Description
Test -
-test tests/integration/neo/test_neo_magma_wo_gsi_n1ql.yml -scope tests/integration/neo/scope_neo_magma_wo_gsi_n1ql.yml
|
Day - 2
Scale - 3
Test step -
Rebalance in data node.
On 172.23.120.74
ns_server.debug.log
[ns_server:debug,2021-10-05T17:58:13.542-07:00,ns_1@172.23.120.74:leader_activities<0.11706.0>:leader_activities:handle_activity_down:505]Activity terminated with reason {shutdown, |
{async_died,
|
{raised,
|
{exit,
|
{not_all_nodes_are_ready_yet,
|
['ns_1@172.23.121.77']}, |
[{ns_rebalancer,rebalance_membase_bucket,
|
6, |
[{file,"src/ns_rebalancer.erl"}, |
{line,654}]}, |
{lists,foreach,2, |
[{file,"lists.erl"},{line,1338}]}, |
{ns_rebalancer,rebalance_kv,4, |
[{file,"src/ns_rebalancer.erl"}, |
{line,605}]}, |
{ns_rebalancer,rebalance_body,5, |
[{file,"src/ns_rebalancer.erl"}, |
{line,556}]}, |
{async,'-async_init/4-fun-1-',3, |
[{file,"src/async.erl"}, |
{line,191}]}]}}}}. Activity: |
{activity,<0.466.216>,#Ref<0.4050249434.2986344455.178469>,default, |
<<"acb85b4b67c10af2ffb786ffc74ab13b">>, |
[rebalance],
|
majority,[]}
|
[error_logger:error,2021-10-05T17:58:13.543-07:00,ns_1@172.23.120.74:<0.196.216>:ale_error_logger_handler:do_log:101] |
=========================CRASH REPORT=========================
|
crasher:
|
initial call: erlang:apply/2 |
pid: <0.196.216> |
registered_name: []
|
exception exit: {not_all_nodes_are_ready_yet,['ns_1@172.23.121.77']} |
in function ns_rebalancer:rebalance_membase_bucket/6 (src/ns_rebalancer.erl, line 654) |
in call from lists:foreach/2 (lists.erl, line 1338) |
in call from ns_rebalancer:rebalance_kv/4 (src/ns_rebalancer.erl, line 605) |
in call from ns_rebalancer:rebalance_body/5 (src/ns_rebalancer.erl, line 556) |
in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 191) |
ancestors: [<0.18226.206>,ns_orchestrator_child_sup,ns_orchestrator_sup, |
mb_master_sup,mb_master,leader_registry_sup,
|
leader_services_sup,<0.11703.0>,ns_server_sup, |
ns_server_nodes_sup,<0.11292.0>,ns_server_cluster_sup, |
root_sup,<0.139.0>] |
message_queue_len: 0 |
messages: []
|
links: [<0.18226.206>] |
dictionary: []
|
trap_exit: false |
status: running
|
heap_size: 46422 |
stack_size: 27 |
reductions: 7521 |
neighbours:
|
|
[user:error,2021-10-05T17:58:13.543-07:00,ns_1@172.23.120.74:<0.18226.206>:ns_orchestrator:log_rebalance_completion:1412]Rebalance exited with reason {not_all_nodes_are_ready_yet, |
['ns_1@172.23.121.77']}. |
Rebalance Operation Id = e68aebfa6a4ca37bc1b2696743d8f161
|
Timeouts are seen in a window of 1 minute prior to rebalance failure.
[ns_server:error,2021-10-05T17:58:03.538-07:00,ns_1@172.23.120.74:<0.22804.208>:stats_interface:from_nodes:122]Failed to get buckets_interesting[] stats from ['ns_1@172.23.121.77'] with reason: |
[{'ns_1@172.23.121.77',timeout}] |
[ns_server:error,2021-10-05T17:58:03.759-07:00,ns_1@172.23.120.74:<0.16280.221>:stats_interface:from_nodes:122]Failed to get buckets_interesting[] stats from ['ns_1@172.23.121.77'] with reason: |
[{'ns_1@172.23.121.77',timeout}] |
[ns_server:error,2021-10-05T17:58:07.074-07:00,ns_1@172.23.120.74:<0.21207.220>:stats_interface:from_nodes:122]Failed to get buckets_interesting[] stats from ['ns_1@172.23.121.77'] with reason: |
[{'ns_1@172.23.121.77',timeout}] |
[ns_server:error,2021-10-05T17:58:07.431-07:00,ns_1@172.23.120.74:<0.16532.221>:stats_interface:from_nodes:122]Failed to get buckets_interesting[] stats from ['ns_1@172.23.121.77'] with reason: |
[{'ns_1@172.23.121.77',timeout}] |
[ns_server:info,2021-10-05T17:58:07.715-07:00,ns_1@172.23.120.74:<0.18226.206>:ns_orchestrator:handle_event:486]Skipping janitor in state rebalancing |
[ns_server:error,2021-10-05T17:58:10.645-07:00,ns_1@172.23.120.74:<0.16280.221>:stats_interface:from_nodes:122]Failed to get buckets_interesting[] stats from ['ns_1@172.23.121.77'] with reason: |
[{'ns_1@172.23.121.77',timeout}] |
On 172.23.121.77
memcached.log
Lots of slow operations around the time of rebalance failure.
2021-10-05T17:58:05.847607-07:00 WARNING 21046: Slow operation: {"bucket":"bucket9","cid":"127.0.0.1:60273/1000000","command":"DCP_SNAPSHOT_MARKER","duration":"558 ms","packet":{"bodylen":37,"cas":0,"datatype":"raw","extlen":1,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1,"opcode":"DCP_SNAPSHOT_MARKER","vbucket":991},"peer":{"ip":"127.0.0.1","port":60273},"trace":"request=24021805724386876:558026 execute=24021805724386876:558026","worker_tid":140208107382528} |
2021-10-05T17:58:05.918539-07:00 INFO (bucket8) vb:561 drop collection:id:0x144 from scope:0x4e, seq:788, manifest:0x2a4, replica |
2021-10-05T17:58:05.918652-07:00 WARNING 21122: Slow operation: {"bucket":"bucket8","cid":"127.0.0.1:42608/14000000","command":"DCP_SYSTEM_EVENT","duration":"834 ms","packet":{"bodylen":29,"cas":0,"datatype":"raw","extlen":13,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":20,"opcode":"DCP_SYSTEM_EVENT","vbucket":561},"peer":{"ip":"127.0.0.1","port":42608},"trace":"request=24021805519770220:833772 execute=24021805519770220:833772","worker_tid":140208090597120} |
2021-10-05T17:58:06.015138-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer on vb:571' on thread NonIoPool0. Schedule overhead: 29 s |
2021-10-05T17:58:06.058360-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer on vb:580' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.066642-07:00 INFO (bucket9) vb:559 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:06.105501-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer no vbucket assigned' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.105854-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer on vb:608' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.120761-07:00 INFO (bucket9) vb:560 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:06.137936-07:00 INFO (bucket9) vb:479 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:06.231609-07:00 WARNING (No Engine) Slow runtime for 'Hashtable resizer on vb:675' on thread NonIoPool0: 126 ms |
2021-10-05T17:58:06.231705-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer on vb:620' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.247893-07:00 INFO (bucket8) vb:229 drop collection:id:0x144 from scope:0x4e, seq:788, manifest:0x2a4, replica |
2021-10-05T17:58:06.248019-07:00 WARNING 19960: Slow operation: {"bucket":"bucket8","cid":"127.0.0.1:44523/12000000","command":"DCP_SYSTEM_EVENT","duration":"620 ms","packet":{"bodylen":29,"cas":0,"datatype":"raw","extlen":13,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":18,"opcode":"DCP_SYSTEM_EVENT","vbucket":229},"peer":{"ip":"127.0.0.1","port":44523},"trace":"request=24021806062466558:620437 execute=24021806062466558:620437","worker_tid":140207999416064} |
2021-10-05T17:58:06.410734-07:00 WARNING (No Engine) Slow runtime for 'Hashtable resizer on vb:676' on thread NonIoPool0: 179 ms |
2021-10-05T17:58:06.410797-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer no vbucket assigned' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.433453-07:00 INFO (bucket9) vb:777 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:657, manifest:0x29a, replica |
2021-10-05T17:58:06.517620-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Hashtable resizer no vbucket assigned' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.518047-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Adjusting hash table sizes.' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:06.518281-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Connection Manager' on thread NonIoPool0. Schedule overhead: 27 s |
2021-10-05T17:58:06.588008-07:00 INFO (bucket8) vb:230 drop collection:id:0x144 from scope:0x4e, seq:788, manifest:0x2a4, replica |
2021-10-05T17:58:06.716590-07:00 INFO (bucket9) vb:815 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:06.885982-07:00 INFO (bucket9) vb:854 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:06.889197-07:00 INFO (bucket8) vb:231 drop collection:id:0x144 from scope:0x4e, seq:788, manifest:0x2a4, replica |
2021-10-05T17:58:06.945336-07:00 INFO (bucket8) vb:200 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.234323-07:00 WARNING 41181: Slow operation: {"bucket":"bucket4","cid":"127.0.0.1:38690/6000000","command":"DCP_MUTATION","duration":"638 ms","packet":{"bodylen":681,"cas":1633481838182268928,"datatype":["JSON","Snappy"],"extlen":31,"key":"<ud>.doc_2</ud>","keylen":6,"magic":"ClientRequest","opaque":6,"opcode":"DCP_MUTATION","vbucket":688},"peer":{"ip":"127.0.0.1","port":38690},"trace":"request=24021807031347798:637801 execute=24021807031347798:637801","worker_tid":140208082204416} |
2021-10-05T17:58:07.238722-07:00 INFO (bucket8) vb:201 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.275932-07:00 INFO (bucket9) vb:935 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:07.276164-07:00 WARNING 21046: Slow operation: {"bucket":"bucket9","cid":"127.0.0.1:60273/15000000","command":"DCP_SYSTEM_EVENT","duration":"1138 ms","packet":{"bodylen":53,"cas":0,"datatype":"raw","extlen":13,"key":"<ud>3dLYhuOeNKKR7Zp6Mmv5Y5kX</ud>","keylen":24,"magic":"ClientRequest","opaque":21,"opcode":"DCP_SYSTEM_EVENT","vbucket":935},"peer":{"ip":"127.0.0.1","port":60273},"trace":"request=24021806572966708:1137970 execute=24021806572966708:1137970","worker_tid":140208107382528} |
2021-10-05T17:58:07.311655-07:00 INFO (bucket8) vb:235 drop collection:id:0x144 from scope:0x4e, seq:788, manifest:0x2a4, replica |
2021-10-05T17:58:07.343788-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Adjusting hash table sizes.' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:07.375992-07:00 INFO (bucket8) vb:205 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.415508-07:00 WARNING (No Engine) Slow runtime for 'Adjusting hash table sizes.' on thread NonIoPool0: 72 ms |
2021-10-05T17:58:07.415562-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Adjusting hash table sizes.' on thread NonIoPool0. Schedule overhead: 28 s |
2021-10-05T17:58:07.454872-07:00 WARNING (No Engine) Slow runtime for 'Adjusting hash table sizes.' on thread NonIoPool0: 39 ms |
2021-10-05T17:58:07.458753-07:00 INFO (bucket8) vb:206 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.485579-07:00 INFO (bucket9) vb:1007 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:07.579159-07:00 INFO (bucket8) vb:313 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.710379-07:00 INFO (bucket8) vb:229 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.710508-07:00 INFO (bucket8) vb:230 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.710553-07:00 INFO (bucket8) vb:231 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.710598-07:00 INFO (bucket8) vb:235 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.710639-07:00 INFO (bucket8) vb:229 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.732668-07:00 INFO (bucket8) vb:230 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.732775-07:00 INFO (bucket8) vb:231 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.773654-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Process checkpoint(s) for DCP producer eq_dcpq:replication:ns_1@172.23.121.77->ns_1@172.23.120.75:bucket7' on thread NonIoPool0. Schedule overhead: 23 s |
2021-10-05T17:58:07.817744-07:00 INFO (bucket8) vb:235 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.849485-07:00 INFO (bucket8) vb:495 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.849591-07:00 INFO (bucket8) vb:561 drop collection:id:0x141 from scope:0x4e, seq:789, manifest:0x2a4, replica |
2021-10-05T17:58:07.880215-07:00 INFO (bucket8) vb:200 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.880315-07:00 INFO (bucket8) vb:201 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.942126-07:00 INFO (bucket9) vb:936 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:07.942256-07:00 WARNING 21046: Slow operation: {"bucket":"bucket9","cid":"127.0.0.1:60273/14000000","command":"DCP_SYSTEM_EVENT","duration":"666 ms","packet":{"bodylen":53,"cas":0,"datatype":"raw","extlen":13,"key":"<ud>3dLYhuOeNKKR7Zp6Mmv5Y5kX</ud>","keylen":24,"magic":"ClientRequest","opaque":20,"opcode":"DCP_SYSTEM_EVENT","vbucket":936},"peer":{"ip":"127.0.0.1","port":60273},"trace":"request=24021807711163705:665976 execute=24021807711163705:665976","worker_tid":140208107382528} |
2021-10-05T17:58:07.952356-07:00 INFO (bucket8) vb:205 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:07.992900-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Process checkpoint(s) for DCP producer eq_dcpq:xdcr:dcp_5f4b32de7a13b85221a2eb842ed656ea/default/remote_172.23.121.77:11210_0:yCXGC3D_SvvO1B1x3jXD_Q==' on thread NonIoPool0. Schedule overhead: 22 s |
2021-10-05T17:58:08.083106-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Process checkpoint(s) for DCP producer eq_dcpq:replication:ns_1@172.23.121.77->ns_1@172.23.123.26:bucket5' on thread NonIoPool0. Schedule overhead: 23 s |
2021-10-05T17:58:08.099749-07:00 INFO (bucket9) vb:954 create collection:id:0x139, name:3dLYhuOeNKKR7Zp6Mmv5Y5kX in scope:0x52, seq:737, manifest:0x29a, replica |
2021-10-05T17:58:08.112350-07:00 INFO (bucket8) vb:206 drop scope:id:0x4e seq:790, manifest:0x2a5, replica |
2021-10-05T17:58:08.270341-07:00 WARNING (No Engine) Slow scheduling for NON_IO task 'Process checkpoint(s) for DCP producer eq_dcpq:replication:ns_1@172.23.121.77->ns_1@172.23.123.24:bucket7' on thread NonIoPool0. Schedule overhead: 22 s |