Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
7.2.3
-
Operating System : Microsoft Windows Server 2019 Base
AMI ID : ami-093693792d26e4373
Couchbase Enterprise Edition build 7.2.3-6710
-
Untriaged
-
Windows 64-bit
-
-
0
-
Unknown
Description
Steps to repro
- Launched Microsoft Windows Server 2019 Base EC2 instances with AMI : ami-093693792d26e4373
- Installed Couchbase Enterprise Edition build 7.2.0-5325 on the nodes
- Created a 2 node cluster with all services
- Loaded all sample buckets
- Created 3 buckets - magma, couchstore and ephemeral
- Loaded documents using cbc-pillowfight.
- Commenced upgrade using swap rebalance, rebalancing in a node with version 7.2.3-6710 and removing an existing cluster node
Rebalance fails
2023-12-05T11:41:38.731Z, ns_orchestrator:0:critical:message(ns_1@172.31.90.195) - Rebalance exited with reason {mover_crashed,timeout}.Rebalance Operation Id = 72f22ca59a3aef9671551c40ab0ac8e0 |
Observing Crash reports in ns_server.debug.log
[ns_server:debug,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9326.7>:ns_vbucket_mover:terminate:226]ns_vbucket_mover terminating when some workers are still running:[{<0.31699.16>, {move,{24, ['ns_1@172.31.90.195', 'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'], ['ns_1@172.31.90.195', 'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'], []}}}, {<0.8214.17>, {move,{548, ['ns_1@ec2-3-209-56-251.compute-1.amazonaws.com', 'ns_1@172.31.90.195'], ['ns_1@ec2-34-232-63-164.compute-1.amazonaws.com', 'ns_1@172.31.90.195'], []}}}, {<0.9996.17>, {move,{21, ['ns_1@172.31.90.195', 'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'], ['ns_1@172.31.90.195', 'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'], []}}}, {<0.1931.17>, {move,{22, ['ns_1@172.31.90.195', 'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'], ['ns_1@172.31.90.195', 'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'], []}}}, {<0.9470.17>, {move,{547, ['ns_1@ec2-3-209-56-251.compute-1.amazonaws.com', 'ns_1@172.31.90.195'], ['ns_1@ec2-34-232-63-164.compute-1.amazonaws.com', 'ns_1@172.31.90.195'], []}}}][ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.31699.16>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9996.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.1931.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9470.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.8214.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.1931.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.1931.17> registered_name: [] exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}} in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80) in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487) in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49) ancestors: [<0.9326.7>,<0.26423.1>] message_queue_len: 1 messages: [{'EXIT',<0.9326.7>,timeout}] links: [<0.9326.7>] dictionary: [{cleanup_list,[<0.10209.17>]}] trap_exit: true status: running heap_size: 2586 stack_size: 29 reductions: 8905 neighbours: |
[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.8214.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.8214.17> registered_name: [] exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}} in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80) in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459) in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156) in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52) in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487) in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49) ancestors: [<0.9326.7>,<0.26423.1>] message_queue_len: 1 messages: [{'EXIT',<0.9326.7>,timeout}] links: [<0.9326.7>] dictionary: [{cleanup_list,[<0.8096.17>]}] trap_exit: true status: running heap_size: 2586 stack_size: 29 reductions: 6769 neighbours: |
[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.31699.16>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.31699.16> registered_name: [] exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}} in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80) in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459) in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156) in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52) in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487) in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49) ancestors: [<0.9326.7>,<0.26423.1>] message_queue_len: 1 messages: [{'EXIT',<0.9326.7>,timeout}] links: [<0.9326.7>] dictionary: [{cleanup_list,[<0.773.17>]}] trap_exit: true status: running heap_size: 1598 stack_size: 29 reductions: 4583 neighbours: |
[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9996.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.9996.17> registered_name: [] exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}} in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80) in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 152) in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52) in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487) in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49) ancestors: [<0.9326.7>,<0.26423.1>] message_queue_len: 1 messages: [{'EXIT',<0.9326.7>,timeout}] links: [<0.9326.7>] dictionary: [{cleanup_list,[<0.10210.17>]}] trap_exit: true status: running heap_size: 987 stack_size: 29 reductions: 2309 neighbours: |
[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9470.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.9470.17> registered_name: [] exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}} in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80) in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459) in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156) in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52) in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487) in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49) ancestors: [<0.9326.7>,<0.26423.1>] message_queue_len: 1 messages: [{'EXIT',<0.9326.7>,timeout}] links: [<0.9326.7>] dictionary: [{cleanup_list,[<0.9403.17>]}] trap_exit: true status: running heap_size: 2586 stack_size: 29 reductions: 6771 neighbours: |
Attempted multiple re-tries, always fails with the same rebalance failure errors
Tried to reproduce it again using fresh EC2 instances and the rebalance passes. Looks like an intermittent issue