Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7115

Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out.

    Details

      Description

      Scenario:

      • 10 node cluster with build 1942
      • Rebalance out 5 nodes (completed successfully)
      • Cluster right now: 5 nodes
      • Add 5 nodes (with build 1944) and remove 3 nodes.
      • Hit rebalance.
      • Rebalance failed with reason:

      Rebalance exited with reason {badmatch,
      [{<0.26283.119>,
      badmatch,{error,emfile,
      [

      {ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}

      ,

      {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]}

      - Tried rebalance again, but failed repetitively:

      Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits, 1},{misc,try_with_maybe_ignorant_after,2}

      ,

      {gen_server,terminate,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      [<0.11023.120>,

      {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}

      ,
      infinity]}}

      Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.

      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

      Noticed this on one of the nodes being rebalanced out:
      Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

      So failed over the node and tried rebalancing, rebalancing still failed.

      So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        abhinav Abhinav Dangeti created issue -
        abhinav Abhinav Dangeti made changes -
        Field Original Value New Value
        Description Scenario:
        - 10 node cluster with build 1942
        - Rebalance out 5 nodes (completed successfully)
        - Cluster right now: 5 nodes
        - Add 5 nodes (with build 1944) and remove 3 nodes.
        - Hit rebalance.
        - Rebalance failed with reason:

        Rebalance exited with reason {badmatch,
        [{<0.26283.119>,
        {{badmatch,{error,emfile}},
        [{ns_replicas_builder_utils,
        kill_a_bunch_of_tap_names,3},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]}}]}

        - Tried rebalance again, but failed repetitively:

        Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
        [{misc,sync_shutdown_many_i_am_trapping_exits,
        1},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]},
        {gen_server,call,
        [<0.11023.120>,
        {shutdown_replicator,
        'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
        infinity]}}

        Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly.
        Scenario:
        - 10 node cluster with build 1942
        - Rebalance out 5 nodes (completed successfully)
        - Cluster right now: 5 nodes
        - Add 5 nodes (with build 1944) and remove 3 nodes.
        - Hit rebalance.
        - Rebalance failed with reason:

        Rebalance exited with reason {badmatch,
        [{<0.26283.119>,
        {{badmatch,{error,emfile}},
        [{ns_replicas_builder_utils,
        kill_a_bunch_of_tap_names,3},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]}}]}

        - Tried rebalance again, but failed repetitively:

        Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
        [{misc,sync_shutdown_many_i_am_trapping_exits,
        1},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]},
        {gen_server,call,
        [<0.11023.120>,
        {shutdown_replicator,
        'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
        infinity]}}

        Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly.

        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

        Noticed this on one of the nodes being rebalanced out:
        Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

        So failed over the node and tried rebalancing, rebalancing still failed.

        So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
        abhinav Abhinav Dangeti made changes -
        Summary Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out.
        abhinav Abhinav Dangeti made changes -
        Description Scenario:
        - 10 node cluster with build 1942
        - Rebalance out 5 nodes (completed successfully)
        - Cluster right now: 5 nodes
        - Add 5 nodes (with build 1944) and remove 3 nodes.
        - Hit rebalance.
        - Rebalance failed with reason:

        Rebalance exited with reason {badmatch,
        [{<0.26283.119>,
        {{badmatch,{error,emfile}},
        [{ns_replicas_builder_utils,
        kill_a_bunch_of_tap_names,3},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]}}]}

        - Tried rebalance again, but failed repetitively:

        Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
        [{misc,sync_shutdown_many_i_am_trapping_exits,
        1},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]},
        {gen_server,call,
        [<0.11023.120>,
        {shutdown_replicator,
        'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
        infinity]}}

        Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly.

        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

        Noticed this on one of the nodes being rebalanced out:
        Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

        So failed over the node and tried rebalancing, rebalancing still failed.

        So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
        Scenario:
        - 10 node cluster with build 1942
        - Rebalance out 5 nodes (completed successfully)
        - Cluster right now: 5 nodes
        - Add 5 nodes (with build 1944) and remove 3 nodes.
        - Hit rebalance.
        - Rebalance failed with reason:

        Rebalance exited with reason {badmatch,
        [{<0.26283.119>,
        {{badmatch,{error,emfile}},
        [{ns_replicas_builder_utils,
        kill_a_bunch_of_tap_names,3},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]}}]}

        - Tried rebalance again, but failed repetitively:

        Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
        [{misc,sync_shutdown_many_i_am_trapping_exits,
        1},
        {misc,try_with_maybe_ignorant_after,2},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]},
        {gen_server,call,
        [<0.11023.120>,
        {shutdown_replicator,
        'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
        infinity]}}

        Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.

        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

        Noticed this on one of the nodes being rebalanced out:
        Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

        So failed over the node and tried rebalancing, rebalancing still failed.

        So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
        abhinav Abhinav Dangeti made changes -
        Assignee Abhinav Dangeti [ abhinav ]
        ketaki Ketaki Gangal made changes -
        Assignee Abhinav Dangeti [ abhinav ] Aleksey Kondratenko [ alkondratenko ]
        Priority Major [ 3 ] Critical [ 2 ]
        Component/s cross-datacenter-replication [ 10136 ]
        Component/s ns_server [ 10019 ]
        Aliaksey Artamonau Aliaksey Artamonau made changes -
        Assignee Aleksey Kondratenko [ alkondratenko ] Abhinav Dangeti [ abhinav ]
        junyi Junyi Xie (Inactive) made changes -
        Component/s cross-datacenter-replication [ 10136 ]
        steve Steve Yen made changes -
        Assignee Abhinav Dangeti [ abhinav ] Aleksey Kondratenko [ alkondratenko ]
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Duplicate [ 3 ]
        ketaki Ketaki Gangal made changes -
        Resolution Duplicate [ 3 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Assignee Aleksey Kondratenko [ alkondratenko ] Abhinav Dangeti [ abhinav ]
        steve Steve Yen made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            abhinav Abhinav Dangeti
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes