Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7115

Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out.

    Details

      Description

      Scenario:

      • 10 node cluster with build 1942
      • Rebalance out 5 nodes (completed successfully)
      • Cluster right now: 5 nodes
      • Add 5 nodes (with build 1944) and remove 3 nodes.
      • Hit rebalance.
      • Rebalance failed with reason:

      Rebalance exited with reason {badmatch,
      [{<0.26283.119>,
      badmatch,{error,emfile,
      [

      {ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}

      ,

      {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]}

      - Tried rebalance again, but failed repetitively:

      Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits, 1},{misc,try_with_maybe_ignorant_after,2}

      ,

      {gen_server,terminate,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      [<0.11023.120>,

      {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}

      ,
      infinity]}}

      Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.

      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

      Noticed this on one of the nodes being rebalanced out:
      Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

      So failed over the node and tried rebalancing, rebalancing still failed.

      So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        abhinav Abhinav Dangeti added a comment - Grabbed diags: https://s3.amazonaws.com/bugdb/MB-7115/C_rebalance_fails_ns-diag-20121107040344.txt.zip
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        Diags from ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com are needed here.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - Diags from ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com are needed here.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Looks a rebalance issue with no direct connection to XDCR

        Show
        junyi Junyi Xie (Inactive) added a comment - Looks a rebalance issue with no direct connection to XDCR
        Hide
        thuan Thuan Nguyen added a comment -

        Hit this bug again in build 2.0.0-1941. Bin creates a 4 nodes windows cluster with 5 million items. Then Bin creates xdcr to one ubuntu 10.04 64bit (ec2-50-16-18-63.compute-1.amazonaws.com). At destination, add another ubuntu node to node ec2-50 and rebalance. Rebalance failed with error:
        Rebalance exited with reason {mover_failed,{badmatch,

        {error,emfile}}}

        Look at diags, I see ebucketmigrator crashed

        [rebalance:info,2012-11-07T22:36:53.434,ns_1@10.108.30.25:<0.11871.3>:janitor_agent:wait_index_updated:459]default: Doing wait_index_updated call for ns_1@10.46.215.52 (vbucket 677)
        [ns_server:info,2012-11-07T22:36:53.445,ns_1@10.108.30.25:<0.11709.3>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.108.30.25': [<<"replication_building_677_'ns_1@10.46.215.52'">>]
        [ns_server:debug,2012-11-07T22:36:53.458,ns_1@10.108.30.25:<0.11706.3>:ns_single_vbucket_mover:spawn_ebucketmigrator_mover:283]Spawned mover "default" 677 'ns_1@10.108.30.25' -> 'ns_1@10.46.215.52': <0.11874.3>
        [ns_server:info,2012-11-07T22:36:53.461,ns_1@10.108.30.25:<0.11874.3>:ebucketmigrator_srv:init:490]Setting {"10.46.215.52",11209} vbucket 677 to state replica
        [ns_server:error,2012-11-07T22:36:53.621,ns_1@10.108.30.25:<0.11876.3>:menelaus_web_alerts_srv:can_listen:349]gen_udp:open failed due to emfile
        [user:info,2012-11-07T22:36:53.621,ns_1@10.108.30.25:<0.11876.3>:menelaus_web_alerts_srv:global_alert:87]IP address seems to have changed. Unable to listen on 'ns_1@10.108.30.25'.
        [ns_server:info,2012-11-07T22:36:53.621,ns_1@10.108.30.25:ns_log<0.785.0>:ns_log:handle_cast:125]suppressing duplicate log menelaus_web_alerts_srv:1("IP address seems to have changed. Unable to listen on 'ns_1@10.108.30.25'.") because it's been seen 1 times in the past 9.049832 secs (last seen 9.049832 secs ago
        [ns_server:info,2012-11-07T22:36:53.638,ns_1@10.108.30.25:ns_port_memcached<0.864.0>:ns_port_server:log:171]memcached<0.864.0>: Wed Nov 7 22:36:53.446940 UTC 3: TAP (Producer) eq_tapq:replication_building_677_'ns_1@10.46.215.52' - disconnected, keep alive for 300 seconds
        memcached<0.864.0>: Wed Nov 7 22:36:53.454932 UTC 3: TAP (Producer) eq_tapq:replication_building_677_'ns_1@10.46.215.52' - Connection is closed by force.

        [ns_server:debug,2012-11-07T22:36:53.659,ns_1@10.108.30.25:<0.11874.3>:ebucketmigrator_srv:kill_tapname:966]killing tap named: rebalance_677
        [ns_server:info,2012-11-07T22:36:53.663,ns_1@10.108.30.25:<0.30736.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.31142.0>,
        {dict,2,16,16,8,80,48,
        {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]},
        {{[],[],
        [['ns_1@10.46.215.52'|0.322265625]],
        [],[],
        [['ns_1@10.108.30.25'|
        0.7740885416666666]],
        [],[],[],[],[],[],[],[],[],[]}}},
        ['ns_1@10.108.30.25',
        'ns_1@10.46.215.52'],
        [],[]}
        [error_logger:error,2012-11-07T22:36:53.663,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
        =========================CRASH REPORT=========================
        crasher:
        initial call: ebucketmigrator_srv:init/1
        pid: <0.11874.3>
        registered_name: []
        exception error: no match of right hand side value {error,emfile}

        in function ebucketmigrator_srv:connect/4
        in call from ebucketmigrator_srv:init/1
        ancestors: [<0.11706.3>,<0.31209.0>,<0.31142.0>]
        messages: []
        links: Port<0.142811>,<0.11706.3>,#Port<0.142808>
        dictionary: []
        trap_exit: false
        status: running
        heap_size: 987
        stack_size: 24
        reductions: 123486
        neighbours:

        [ns_server:info,2012-11-07T22:36:53.665,ns_1@10.108.30.25:<0.11709.3>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.108.30.25': []
        [rebalance:error,2012-11-07T22:36:53.666,ns_1@10.108.30.25:<0.31209.0>:ns_vbucket_mover:handle_info:252]<0.11706.3> exited with {mover_failed,{badmatch,

        {error,emfile}}}
        [error_logger:error,2012-11-07T22:36:53.666,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
        =========================CRASH REPORT=========================
        crasher:
        initial call: ns_single_vbucket_mover:mover/6
        pid: <0.11706.3>
        registered_name: []
        exception exit: {mover_failed,{badmatch,{error,emfile}

        }}
        in function ns_single_vbucket_mover:wait_for_mover/5
        in call from ns_single_vbucket_mover:mover_inner/6
        in call from misc:try_with_maybe_ignorant_after/2
        in call from ns_single_vbucket_mover:mover/6
        ancestors: [<0.31209.0>,<0.31142.0>]
        messages: []
        links: [<0.31209.0>]
        dictionary: [

        {cleanup_list,[<0.11709.3>]}

        ]
        trap_exit: true
        status: running
        heap_size: 75025
        stack_size: 24
        reductions: 13260
        neighbours:

        [ns_server:info,2012-11-07T22:36:53.684,ns_1@10.108.30.25:janitor_agent-default<0.1056.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
        [ns_server:debug,2012-11-07T22:36:53.685,ns_1@10.108.30.25:<0.31219.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription

        {ns_node_disco_events,<0.31209.0>}

        exited with reason {mover_failed,
        {badmatch,

        {error, emfile}

        }}
        [user:info,2012-11-07T22:36:53.685,ns_1@10.108.30.25:<0.30736.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {mover_failed,{badmatch,

        {error,emfile}

        }}

        [ns_server:debug,2012-11-07T22:36:53.774,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
        [ns_server:info,2012-11-07T22:36:53.778,ns_1@10.108.30.25:<0.12564.3>:diag_handler:log_all_tap_and_checkpoint_stats:127]logging tap & checkpoint stats
        [ns_server:debug,2012-11-07T22:36:53.778,ns_1@10.108.30.25:ns_config_rep<0.797.0>:ns_config_rep:do_push_keys:317]Replicating some config keys ([counters,rebalance_status,rebalancer_pid]..)
        [ns_server:debug,2012-11-07T22:36:53.797,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
        [ns_server:debug,2012-11-07T22:36:53.798,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111]config change:
        counters ->
        [

        {rebalance_fail,1}

        ,

        {rebalance_start,1}

        ]
        [ns_server:debug,2012-11-07T22:36:53.804,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
        [ns_server:debug,2012-11-07T22:36:53.818,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
        [ns_server:debug,2012-11-07T22:36:53.818,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111]config change:
        rebalancer_pid ->
        undefined
        [ns_server:debug,2012-11-07T22:36:53.831,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111]config change:
        rebalance_status ->

        {none,<<"Rebalance failed. See logs for detailed reason. You can try rebalance again.">>}

        [ns_server:debug,2012-11-07T22:36:53.830,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
        [error_logger:error,2012-11-07T22:36:53.669,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.31209.0> terminating

          • Last message in was {'EXIT',<0.11706.3>,
            {mover_failed,{badmatch,
            Unknown macro: {error,emfile}

            }}}

          • When Server state == {state,"default",<0.31219.0>,
            {dict,2,16,16,8,80,48,
            {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]}

            ,
            {{[],[],
            [['ns_1@

        Link to collect info of all nodes of destination cluster https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/2-ec2-1941reb-failed-no-match-right-hand-20121107.tgz

        Show
        thuan Thuan Nguyen added a comment - Hit this bug again in build 2.0.0-1941. Bin creates a 4 nodes windows cluster with 5 million items. Then Bin creates xdcr to one ubuntu 10.04 64bit (ec2-50-16-18-63.compute-1.amazonaws.com). At destination, add another ubuntu node to node ec2-50 and rebalance. Rebalance failed with error: Rebalance exited with reason {mover_failed,{badmatch, {error,emfile}}} Look at diags, I see ebucketmigrator crashed [rebalance:info,2012-11-07T22:36:53.434,ns_1@10.108.30.25:<0.11871.3>:janitor_agent:wait_index_updated:459] default: Doing wait_index_updated call for ns_1@10.46.215.52 (vbucket 677) [ns_server:info,2012-11-07T22:36:53.445,ns_1@10.108.30.25:<0.11709.3>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59] Killed the following tap names on 'ns_1@10.108.30.25': [<<"replication_building_677_'ns_1@10.46.215.52'">>] [ns_server:debug,2012-11-07T22:36:53.458,ns_1@10.108.30.25:<0.11706.3>:ns_single_vbucket_mover:spawn_ebucketmigrator_mover:283] Spawned mover "default" 677 'ns_1@10.108.30.25' -> 'ns_1@10.46.215.52': <0.11874.3> [ns_server:info,2012-11-07T22:36:53.461,ns_1@10.108.30.25:<0.11874.3>:ebucketmigrator_srv:init:490] Setting {"10.46.215.52",11209} vbucket 677 to state replica [ns_server:error,2012-11-07T22:36:53.621,ns_1@10.108.30.25:<0.11876.3>:menelaus_web_alerts_srv:can_listen:349] gen_udp:open failed due to emfile [user:info,2012-11-07T22:36:53.621,ns_1@10.108.30.25:<0.11876.3>:menelaus_web_alerts_srv:global_alert:87] IP address seems to have changed. Unable to listen on 'ns_1@10.108.30.25'. [ns_server:info,2012-11-07T22:36:53.621,ns_1@10.108.30.25:ns_log<0.785.0>:ns_log:handle_cast:125] suppressing duplicate log menelaus_web_alerts_srv:1("IP address seems to have changed. Unable to listen on 'ns_1@10.108.30.25'.") because it's been seen 1 times in the past 9.049832 secs (last seen 9.049832 secs ago [ns_server:info,2012-11-07T22:36:53.638,ns_1@10.108.30.25:ns_port_memcached<0.864.0>:ns_port_server:log:171] memcached<0.864.0>: Wed Nov 7 22:36:53.446940 UTC 3: TAP (Producer) eq_tapq:replication_building_677_'ns_1@10.46.215.52' - disconnected, keep alive for 300 seconds memcached<0.864.0>: Wed Nov 7 22:36:53.454932 UTC 3: TAP (Producer) eq_tapq:replication_building_677_'ns_1@10.46.215.52' - Connection is closed by force. [ns_server:debug,2012-11-07T22:36:53.659,ns_1@10.108.30.25:<0.11874.3>:ebucketmigrator_srv:kill_tapname:966] killing tap named: rebalance_677 [ns_server:info,2012-11-07T22:36:53.663,ns_1@10.108.30.25:<0.30736.0>:ns_orchestrator:handle_info:282] Skipping janitor in state rebalancing: {rebalancing_state,<0.31142.0>, {dict,2,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]}, {{[],[], [ ['ns_1@10.46.215.52'|0.322265625] ], [],[], [['ns_1@10.108.30.25'| 0.7740885416666666]], [],[],[],[],[],[],[],[],[],[]}}}, ['ns_1@10.108.30.25', 'ns_1@10.46.215.52'], [],[]} [error_logger:error,2012-11-07T22:36:53.663,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ebucketmigrator_srv:init/1 pid: <0.11874.3> registered_name: [] exception error: no match of right hand side value {error,emfile} in function ebucketmigrator_srv:connect/4 in call from ebucketmigrator_srv:init/1 ancestors: [<0.11706.3>,<0.31209.0>,<0.31142.0>] messages: [] links: Port<0.142811>,<0.11706.3>,#Port<0.142808> dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 123486 neighbours: [ns_server:info,2012-11-07T22:36:53.665,ns_1@10.108.30.25:<0.11709.3>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59] Killed the following tap names on 'ns_1@10.108.30.25': [] [rebalance:error,2012-11-07T22:36:53.666,ns_1@10.108.30.25:<0.31209.0>:ns_vbucket_mover:handle_info:252] <0.11706.3> exited with {mover_failed,{badmatch, {error,emfile}}} [error_logger:error,2012-11-07T22:36:53.666,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.11706.3> registered_name: [] exception exit: {mover_failed,{badmatch,{error,emfile} }} in function ns_single_vbucket_mover:wait_for_mover/5 in call from ns_single_vbucket_mover:mover_inner/6 in call from misc:try_with_maybe_ignorant_after/2 in call from ns_single_vbucket_mover:mover/6 ancestors: [<0.31209.0>,<0.31142.0>] messages: [] links: [<0.31209.0>] dictionary: [ {cleanup_list,[<0.11709.3>]} ] trap_exit: true status: running heap_size: 75025 stack_size: 24 reductions: 13260 neighbours: [ns_server:info,2012-11-07T22:36:53.684,ns_1@10.108.30.25:janitor_agent-default<0.1056.0>:janitor_agent:handle_info:676] Undoing temporary vbucket states caused by rebalance [ns_server:debug,2012-11-07T22:36:53.685,ns_1@10.108.30.25:<0.31219.0>:ns_pubsub:do_subscribe_link:132] Parent process of subscription {ns_node_disco_events,<0.31209.0>} exited with reason {mover_failed, {badmatch, {error, emfile} }} [user:info,2012-11-07T22:36:53.685,ns_1@10.108.30.25:<0.30736.0>:ns_orchestrator:handle_info:319] Rebalance exited with reason {mover_failed,{badmatch, {error,emfile} }} [ns_server:debug,2012-11-07T22:36:53.774,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349] doing replicate_newnodes_docs [ns_server:info,2012-11-07T22:36:53.778,ns_1@10.108.30.25:<0.12564.3>:diag_handler:log_all_tap_and_checkpoint_stats:127] logging tap & checkpoint stats [ns_server:debug,2012-11-07T22:36:53.778,ns_1@10.108.30.25:ns_config_rep<0.797.0>:ns_config_rep:do_push_keys:317] Replicating some config keys ( [counters,rebalance_status,rebalancer_pid] ..) [ns_server:debug,2012-11-07T22:36:53.797,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349] doing replicate_newnodes_docs [ns_server:debug,2012-11-07T22:36:53.798,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111] config change: counters -> [ {rebalance_fail,1} , {rebalance_start,1} ] [ns_server:debug,2012-11-07T22:36:53.804,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349] doing replicate_newnodes_docs [ns_server:debug,2012-11-07T22:36:53.818,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349] doing replicate_newnodes_docs [ns_server:debug,2012-11-07T22:36:53.818,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111] config change: rebalancer_pid -> undefined [ns_server:debug,2012-11-07T22:36:53.831,ns_1@10.108.30.25:ns_config_log<0.779.0>:ns_config_log:log_common:111] config change: rebalance_status -> {none,<<"Rebalance failed. See logs for detailed reason. You can try rebalance again.">>} [ns_server:debug,2012-11-07T22:36:53.830,ns_1@10.108.30.25:capi_set_view_manager-default<0.1035.0>:capi_set_view_manager:handle_info:349] doing replicate_newnodes_docs [error_logger:error,2012-11-07T22:36:53.669,ns_1@10.108.30.25:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76] ** Generic server <0.31209.0> terminating Last message in was {'EXIT',<0.11706.3>, {mover_failed,{badmatch, Unknown macro: {error,emfile} }}} When Server state == {state,"default",<0.31219.0>, {dict,2,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]} , {{[],[], [['ns_1@ Link to collect info of all nodes of destination cluster https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/2-ec2-1941reb-failed-no-match-right-hand-20121107.tgz
        Hide
        steve Steve Yen added a comment -

        reassigning since tony attached some more info

        Show
        steve Steve Yen added a comment - reassigning since tony attached some more info
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        emfile in this case (at least in collect infos from Tony) seem to be caused by CLOSE_WAIT sockets. My leading guess is that it's those hang is_missing_rev requests that we're seeing in MB-7129. Only in this environment it's not killing node entirely but rather exhausts file descriptors

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - emfile in this case (at least in collect infos from Tony) seem to be caused by CLOSE_WAIT sockets. My leading guess is that it's those hang is_missing_rev requests that we're seeing in MB-7129 . Only in this environment it's not killing node entirely but rather exhausts file descriptors
        Hide
        ketaki Ketaki Gangal added a comment -

        Re-opening this , for repro/ observation on the xdcr- fixed changes.

        Show
        ketaki Ketaki Gangal added a comment - Re-opening this , for repro/ observation on the xdcr- fixed changes.
        Hide
        abhinav Abhinav Dangeti added a comment -

        When this issue originally occurred, the clusters were in an XDCR setup but before starting the rebalance operation, replication had been deleted, and the failures occurred on the destination cluster.

        To investigate whether this issue happens with/without XDCR,

        case1: Tried reproducing this with 2 ongoing unidirectional replications, on the destination cluster (with 7 nodes) - rebalanced in 3 nodes (1949) and rebalanced out 5 nodes (1944)
        Source was on 1944 (a 7 node cluster) had a fixed no. of items on both buckets, with no front end load.
        Rebalance on the destination cluster failed with the same reason, where the cluster was "unable to listen" to one of the nodes.

        case2: Deleted replications on the source, cleaned the destination cluster, loaded all nodes with (1949).
        Created a 5 node cluster (not part of any xdcr).
        Created 2 buckets and with ongoing front end load:

        • rebalanced out 3 nodes, rebalanced in 5 nodes :: rebalance operation completed successfully.
        • rebalanced out 5 nodes, rebalanced in 3 nodes :: rebalance operation completed successfully.
        Show
        abhinav Abhinav Dangeti added a comment - When this issue originally occurred, the clusters were in an XDCR setup but before starting the rebalance operation, replication had been deleted, and the failures occurred on the destination cluster. To investigate whether this issue happens with/without XDCR, case1: Tried reproducing this with 2 ongoing unidirectional replications, on the destination cluster (with 7 nodes) - rebalanced in 3 nodes (1949) and rebalanced out 5 nodes (1944) Source was on 1944 (a 7 node cluster) had a fixed no. of items on both buckets, with no front end load. Rebalance on the destination cluster failed with the same reason, where the cluster was "unable to listen" to one of the nodes. case2: Deleted replications on the source, cleaned the destination cluster, loaded all nodes with (1949). Created a 5 node cluster (not part of any xdcr). Created 2 buckets and with ongoing front end load: rebalanced out 3 nodes, rebalanced in 5 nodes :: rebalance operation completed successfully. rebalanced out 5 nodes, rebalanced in 3 nodes :: rebalance operation completed successfully.
        Hide
        steve Steve Yen added a comment -

        bug-scrub: 1949 didn't have the timeout fix? Please try again with 1950. thanks.

        Show
        steve Steve Yen added a comment - bug-scrub: 1949 didn't have the timeout fix? Please try again with 1950. thanks.
        Hide
        steve Steve Yen added a comment -

        bug-scrub:

        ketaki reports it works

        Show
        steve Steve Yen added a comment - bug-scrub: ketaki reports it works

          People

          • Assignee:
            abhinav Abhinav Dangeti
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes