Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: 7.0.0
Component/s: UI
Labels:
Environment:
Centos 7 64 bit; CB EE 7.0.0-5085

Story Points:
1

Description

Summary:
The preferred way to failover when multiple nodes are unresponsive is to failover multiple all of them at once(multiple nodes failover). So the ask here is if it is possible to force the multiple failover dialog on UI when the user attempts to failover one of the unresponsive nodes (by clicking on failover next to the server) using the failover option against the server.
(Note that this is not a quorum failover)

Elaborating the current behaviour with an example
1. Create a 5 node server .215, .217, .219, .237, .90
2. Load travel-sample with 3 replicas
3. Stop server on .217 and .219. to make these 2 nodes unresponsive.
So here's what happens currently when the user attempts to failover them individually one by one (instead of failing them over both together)

on UI:
UI didn't return any response and it seemed like it was processing the failover indefinitely without a response for a long time.

REST API:
Returns a response of unexpected server error.

on ns_server_error.log

[ns_server:error,2021-05-05T00:26:40.393-07:00,ns_1@172.23.105.215:<0.16783.3>:ns_doctor:wait_statuses_loop:251]Couldn't get statuses for ['ns_1@172.23.105.219']

[ns_server:error,2021-05-05T00:26:40.393-07:00,ns_1@172.23.105.215:<0.16155.3>:menelaus_util:reply_server_error:206]Server error during processing: ["web request failed",

                                 {path,"/pools/default"},

                                 {method,'POST'},

                                 {type,error},

                                 {what,

                                  {badmatch,

                                   {error,{timeout,['ns_1@172.23.105.219']}}}},

                                 {trace,

                                  [{menelaus_web_pools,

                                    do_validate_memory_quota,4,

                                    [{file,"src/menelaus_web_pools.erl"},

                                     {line,407}]},

                                   {lists,foldl,3,

                                    [{file,"lists.erl"},{line,1263}]},

                                   {validator,handle,4,

                                    [{file,"src/validator.erl"},{line,79}]},

                                   {menelaus_web_pools,

                                    do_handle_pool_settings_post_loop,2,

                                    [{file,"src/menelaus_web_pools.erl"},

                                     {line,451}]},

                                   {request_throttler,do_request,3,

                                    [{file,"src/request_throttler.erl"},

                                     {line,58}]},

                                   {menelaus_util,handle_request,2,

                                    [{file,"src/menelaus_util.erl"},

                                     {line,217}]},

                                   {mochiweb_http,headers,6,

                                    [{file,

                                      "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},

                                     {line,150}]},

                                   {proc_lib,init_p_do_apply,3,

                                    [{file,"proc_lib.erl"},{line,249}]}]}]

[ns_server:error,2021-05-05T00:27:52.231-07:00,ns_1@172.23.105.215:<0.21777.3>:rebalance:progress:147]Couldn't reach ns_rebalance_observer

[ns_server:error,2021-05-05T00:28:02.609-07:00,ns_1@172.23.105.215:<0.21641.3>:ns_rebalance_observer:generic_get_call:108]Unexpected exception {exit,

                         {noproc,

                             {gen_server,call,

                                 [{via,leader_registry,ns_rebalance_observer},

                                  get_aggregated_progress,10000]}}}

[ns_server:error,2021-05-05T00:28:02.609-07:00,ns_1@172.23.105.215:<0.21641.3>:rebalance:progress:147]Couldn't reach ns_rebalance_observer

[ns_server:error,2021-05-05T00:28:13.282-07:00,ns_1@172.23.105.215:<0.28029.3>:ns_rebalance_observer:generic_get_call:108]Unexpected exception {exit,

                         {noproc,

                             {gen_server,call,

                                 [{via,leader_registry,ns_rebalance_observer},

                                  get_aggregated_progress,10000]}}}

(Note that failing them over one by one may still work if the bucket didn't have 3 replicas I think)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

failover_one_by_one.png
307 kB
05/May/21 1:24 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Sumedh Basarkod (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/May/21 1:26 AM

Updated:: 10/Feb/22 5:44 AM

Gerrit Reviews

There are no open Gerrit changes

Force multiple failover dialog when multiple nodes are unresponsive and user attempts to failover one of them

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty