Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46131

Force multiple failover dialog when multiple nodes are unresponsive and user attempts to failover one of them

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 7.0.0
    • UI
    • Centos 7 64 bit; CB EE 7.0.0-5085
    • 1

    Description

      Summary:
      The preferred way to failover when multiple nodes are unresponsive is to failover multiple all of them at once(multiple nodes failover). So the ask here is if it is possible to force the multiple failover dialog on UI when the user attempts to failover one of the unresponsive nodes (by clicking on failover next to the server) using the failover option against the server. 
      (Note that this is not a quorum failover)

      Elaborating the current behaviour with an example
      1. Create a 5 node server .215, .217, .219, .237, .90
      2. Load travel-sample with 3 replicas
      3. Stop server on .217 and .219. to make these 2 nodes unresponsive.
      So here's what happens currently when the user attempts to failover them individually one by one (instead of failing them over both together)

      on UI:
      UI didn't return any response and it seemed like it was processing the failover indefinitely without a response for a long time.

      REST API:
      Returns a response of unexpected server error.

      on ns_server_error.log

      [ns_server:error,2021-05-05T00:26:40.393-07:00,ns_1@172.23.105.215:<0.16783.3>:ns_doctor:wait_statuses_loop:251]Couldn't get statuses for ['ns_1@172.23.105.219']
      [ns_server:error,2021-05-05T00:26:40.393-07:00,ns_1@172.23.105.215:<0.16155.3>:menelaus_util:reply_server_error:206]Server error during processing: ["web request failed",
                                       {path,"/pools/default"},
                                       {method,'POST'},
                                       {type,error},
                                       {what,
                                        {badmatch,
                                         {error,{timeout,['ns_1@172.23.105.219']}}}},
                                       {trace,
                                        [{menelaus_web_pools,
                                          do_validate_memory_quota,4,
                                          [{file,"src/menelaus_web_pools.erl"},
                                           {line,407}]},
                                         {lists,foldl,3,
                                          [{file,"lists.erl"},{line,1263}]},
                                         {validator,handle,4,
                                          [{file,"src/validator.erl"},{line,79}]},
                                         {menelaus_web_pools,
                                          do_handle_pool_settings_post_loop,2,
                                          [{file,"src/menelaus_web_pools.erl"},
                                           {line,451}]},
                                         {request_throttler,do_request,3,
                                          [{file,"src/request_throttler.erl"},
                                           {line,58}]},
                                         {menelaus_util,handle_request,2,
                                          [{file,"src/menelaus_util.erl"},
                                           {line,217}]},
                                         {mochiweb_http,headers,6,
                                          [{file,
                                            "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                           {line,150}]},
                                         {proc_lib,init_p_do_apply,3,
                                          [{file,"proc_lib.erl"},{line,249}]}]}]
      [ns_server:error,2021-05-05T00:27:52.231-07:00,ns_1@172.23.105.215:<0.21777.3>:rebalance:progress:147]Couldn't reach ns_rebalance_observer
      [ns_server:error,2021-05-05T00:28:02.609-07:00,ns_1@172.23.105.215:<0.21641.3>:ns_rebalance_observer:generic_get_call:108]Unexpected exception {exit,
                               {noproc,
                                   {gen_server,call,
                                       [{via,leader_registry,ns_rebalance_observer},
                                        get_aggregated_progress,10000]}}}
      [ns_server:error,2021-05-05T00:28:02.609-07:00,ns_1@172.23.105.215:<0.21641.3>:rebalance:progress:147]Couldn't reach ns_rebalance_observer
      [ns_server:error,2021-05-05T00:28:13.282-07:00,ns_1@172.23.105.215:<0.28029.3>:ns_rebalance_observer:generic_get_call:108]Unexpected exception {exit,
                               {noproc,
                                   {gen_server,call,
                                       [{via,leader_registry,ns_rebalance_observer},
                                        get_aggregated_progress,10000]}}}

      (Note that failing them over one by one may still work if the bucket didn't have 3 replicas I think)

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dfinlay Dave Finlay
            sumedh.basarkod Sumedh Basarkod (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty