Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
7.6.0
-
Operating System : Debian GNU/Linux 12 (bookworm)
Couchbase Enterprise Edition 7.6.0-1851
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
-
KV 2023-4
Description
Steps to repro
- Created a 4 node kv cluster
- Created 10 buckets with different configurations
- Created 5 scopes per bucket and 20 collections per scope
- Loaded data onto each collection (Around 4000 docs onto each collection)
- Added in another kv node and started a rebalance
- Stopped the rebalance
- Started the rebalance again - Rebalance fails at this point
- Rebalance was re-tried - Rebalance succeeds
Rebalance failed with this reason
2023-11-26T00:10:10.905-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.71) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket6", {error,wait_for_memcached_failed, ['ns_1@172.23.96.168', 'ns_1@172.23.96.196', 'ns_1@172.23.96.220']}}. |
Observing crash report in ns_server.debug.log
[ns_server:info,2023-11-26T00:10:10.904-08:00,ns_1@172.23.121.71:rebalance_agent<0.13933.0>:rebalance_agent:handle_down:290]Rebalancer process <0.17257.166> died (reason {pre_rebalance_janitor_run_failed, "bucket6", {error, wait_for_memcached_failed, ['ns_1@172.23.96.168', 'ns_1@172.23.96.196', 'ns_1@172.23.96.220']}}).[ns_server:debug,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:leader_activities<0.13866.0>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown, {async_died, {raised, {exit, {pre_rebalance_janitor_run_failed, "bucket6", {error,wait_for_memcached_failed, ['ns_1@172.23.96.168', 'ns_1@172.23.96.196', 'ns_1@172.23.96.220']}}, [{ns_rebalancer, run_janitor_pre_rebalance,1, [{file,"src/ns_rebalancer.erl"}, {line,699}]}, {lists,foreach_1,2, [{file,"lists.erl"},{line,1442}]}, {ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,482}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,199}]}]}}}}. Activity:{activity,<0.17171.166>,#Ref<0.850806963.699138052.15902>,default, <<"dad76b2b4817f7b78cb2685e3aa20d76">>, [rebalance], majority,[]}[error_logger:error,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:<0.17080.166>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.17080.166> registered_name: [] exception exit: {pre_rebalance_janitor_run_failed,"bucket6", {error,wait_for_memcached_failed, ['ns_1@172.23.96.168','ns_1@172.23.96.196', 'ns_1@172.23.96.220']}} in function ns_rebalancer:run_janitor_pre_rebalance/1 (src/ns_rebalancer.erl, line 699) in call from lists:foreach_1/2 (lists.erl, line 1442) in call from ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 482) in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199) ancestors: [<0.13893.0>,ns_orchestrator_child_sup,ns_orchestrator_sup, mb_master_sup,mb_master,leader_registry_sup, leader_services_sup,<0.13863.0>,ns_server_sup, ns_server_nodes_sup,<0.10331.0>,ns_server_cluster_sup, root_sup,<0.155.0>] message_queue_len: 0 messages: [] links: [<0.13893.0>] dictionary: [] trap_exit: false status: running heap_size: 196650 stack_size: 28 reductions: 11050 neighbours: |
[user:error,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:<0.13893.0>:ns_orchestrator:log_rebalance_completion:1660]Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket6", {error,wait_for_memcached_failed, ['ns_1@172.23.96.168', 'ns_1@172.23.96.196', 'ns_1@172.23.96.220']}}.Rebalance Operation Id = e76b052c6ae2e4f24108ad3c9aa363a4 |
Rebalance succeeds when it was re-tried
Attachments
Issue Links
- relates to
-
MB-60277 Investigate improving the performance of get_mass_dcp_docs_estimate
- Open