Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59972

[Upgrade][Windows] : Upgrade swap rebalance fails with reason {mover_crashed,timeout}.

    XMLWordPrintable

Details

    Description

      Steps to repro

      1. Launched Microsoft Windows Server 2019 Base EC2 instances with AMI : ami-093693792d26e4373
      2. Installed Couchbase Enterprise Edition build 7.2.0-5325 on the nodes
      3. Created a 2 node cluster with all services
      4. Loaded all sample buckets
      5. Created 3 buckets - magma, couchstore and ephemeral
      6. Loaded documents using cbc-pillowfight.
      7. Commenced upgrade using swap rebalance, rebalancing in a node with version 7.2.3-6710 and removing an existing cluster node

      Rebalance fails

       

      2023-12-05T11:41:38.731Z, ns_orchestrator:0:critical:message(ns_1@172.31.90.195) - Rebalance exited with reason {mover_crashed,timeout}.Rebalance Operation Id = 72f22ca59a3aef9671551c40ab0ac8e0 

      Observing Crash reports in ns_server.debug.log

      [ns_server:debug,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9326.7>:ns_vbucket_mover:terminate:226]ns_vbucket_mover terminating when some workers are still running:[{<0.31699.16>,  {move,{24,         ['ns_1@172.31.90.195',          'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'],         ['ns_1@172.31.90.195',          'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'],         []}}}, {<0.8214.17>,  {move,{548,         ['ns_1@ec2-3-209-56-251.compute-1.amazonaws.com',          'ns_1@172.31.90.195'],         ['ns_1@ec2-34-232-63-164.compute-1.amazonaws.com',          'ns_1@172.31.90.195'],         []}}}, {<0.9996.17>,  {move,{21,         ['ns_1@172.31.90.195',          'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'],         ['ns_1@172.31.90.195',          'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'],         []}}}, {<0.1931.17>,  {move,{22,         ['ns_1@172.31.90.195',          'ns_1@ec2-3-209-56-251.compute-1.amazonaws.com'],         ['ns_1@172.31.90.195',          'ns_1@ec2-34-232-63-164.compute-1.amazonaws.com'],         []}}}, {<0.9470.17>,  {move,{547,         ['ns_1@ec2-3-209-56-251.compute-1.amazonaws.com',          'ns_1@172.31.90.195'],         ['ns_1@ec2-34-232-63-164.compute-1.amazonaws.com',          'ns_1@172.31.90.195'],         []}}}][ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.31699.16>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9996.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.1931.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9470.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[ns_server:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.8214.17>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.9326.7>,timeout}[error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.1931.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_single_vbucket_mover:mover/6    pid: <0.1931.17>    registered_name: []    exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}}      in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)      in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)      in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)    ancestors: [<0.9326.7>,<0.26423.1>]    message_queue_len: 1    messages: [{'EXIT',<0.9326.7>,timeout}]    links: [<0.9326.7>]    dictionary: [{cleanup_list,[<0.10209.17>]}]    trap_exit: true    status: running    heap_size: 2586    stack_size: 29    reductions: 8905  neighbours:
      [error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.8214.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_single_vbucket_mover:mover/6    pid: <0.8214.17>    registered_name: []    exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}}      in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)      in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459)      in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156)      in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)      in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)      in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)    ancestors: [<0.9326.7>,<0.26423.1>]    message_queue_len: 1    messages: [{'EXIT',<0.9326.7>,timeout}]    links: [<0.9326.7>]    dictionary: [{cleanup_list,[<0.8096.17>]}]    trap_exit: true    status: running    heap_size: 2586    stack_size: 29    reductions: 6769  neighbours:
      [error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.31699.16>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_single_vbucket_mover:mover/6    pid: <0.31699.16>    registered_name: []    exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}}      in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)      in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459)      in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156)      in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)      in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)      in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)    ancestors: [<0.9326.7>,<0.26423.1>]    message_queue_len: 1    messages: [{'EXIT',<0.9326.7>,timeout}]    links: [<0.9326.7>]    dictionary: [{cleanup_list,[<0.773.17>]}]    trap_exit: true    status: running    heap_size: 1598    stack_size: 29    reductions: 4583  neighbours:
      [error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9996.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_single_vbucket_mover:mover/6    pid: <0.9996.17>    registered_name: []    exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}}      in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)      in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 152)      in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)      in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)      in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)    ancestors: [<0.9326.7>,<0.26423.1>]    message_queue_len: 1    messages: [{'EXIT',<0.9326.7>,timeout}]    links: [<0.9326.7>]    dictionary: [{cleanup_list,[<0.10210.17>]}]    trap_exit: true    status: running    heap_size: 987    stack_size: 29    reductions: 2309  neighbours:
      [error_logger:error,2023-12-05T11:41:38.700Z,ns_1@172.31.90.195:<0.9470.17>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_single_vbucket_mover:mover/6    pid: <0.9470.17>    registered_name: []    exception exit: {unexpected_exit,{'EXIT',<0.9326.7>,timeout}}      in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)      in call from ns_single_vbucket_mover:wait_master_seqno_persisted_on_replicas/5 (src/ns_single_vbucket_mover.erl, line 459)      in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 156)      in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)      in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)      in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)    ancestors: [<0.9326.7>,<0.26423.1>]    message_queue_len: 1    messages: [{'EXIT',<0.9326.7>,timeout}]    links: [<0.9326.7>]    dictionary: [{cleanup_list,[<0.9403.17>]}]    trap_exit: true    status: running    heap_size: 2586    stack_size: 29    reductions: 6771  neighbours: 

      Attempted multiple re-tries, always fails with the same rebalance failure errors

      Tried to reproduce it again using fresh EC2 instances and the rebalance passes. Looks like an intermittent issue

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty