Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: 7.6.2
Affects Version/s: 7.6.0, 7.2.0
Component/s: tools
Labels:
- approved-for-7.6.2

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
No
Sprint:
Tools 2024-Q1

Description

TLDR: the backup service doesn't appear to keep track of the nodes in the cluster providing the backup service. Thus after a cluster reboot a rebalance is required.

When a cluster is rebooted the eventing service returns just itself in "nodes" and "isBalanced" is true. Because the number of "nodes" doesn't match what ns_server has eventing configured for we require a rebalance. To reproduce:

cluster_run -n 2 --dont-rename
cluster_connect -n 2 -s 1024 -I 512 -M plasma -T n0:kv+index+n1ql+fts+eventing+cbas+backup,n1:kv+index+n1ql+fts+eventing+cbas+backup
Log into UI and see that rebalance completes
CTRL^C in the window where cluster_run was run
cluster_run -n 2 --dont-rename

At this point the /pools/default endpoint returns that "eventing" and "backup" require a rebalance.

  "balanced": false,

  "servicesNeedRebalance": [

      "code": "service_not_balanced",

      "description": "Service needs rebalance.",

      "services": [

        "eventing",

        "backup"

The reason ns_server believes "backup" needs a rebalance is due to GetTopology responses from each of the two nodes includes just that node and indicates isBalanced is true. Here's the entries for the two nodes on my run (note each node only knows about itself):

[json_rpc:debug,2024-03-20T14:46:03.158-07:00,n_0@127.0.0.1:json_rpc_connection-backup-service_api<0.1252.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,2},

               {<<"result">>,

                {[{<<"rev">>,<<"AAAAAAAAAAI=">>},

                  {<<"nodes">>,[<<"1080b788c0e8115ce25ff93ed60cd4f1">>]},

                  {<<"isBalanced">>,true}]}},

               {<<"error">>,null}]

and the other node

[json_rpc:debug,2024-03-20T14:46:03.164-07:00,n_1@127.0.0.1:json_rpc_connection-backup-service_api<0.1349.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,2},

               {<<"result">>,

                {[{<<"rev">>,<<"AAAAAAAAAAI=">>},

                  {<<"nodes">>,[<<"16745cea9a733708f49fa44e1def4528">>]},

                  {<<"isBalanced">>,true}]}},

               {<<"error">>,null}]

As an example of what would be expected...this is after doing a rebalance from the UI.

[json_rpc:debug,2024-03-20T14:48:36.190-07:00,n_0@127.0.0.1:json_rpc_connection-backup-service_api<0.1252.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,31},

               {<<"result">>,

                {[{<<"rev">>,<<"AAAAAAAAAAc=">>},

                  {<<"nodes">>,

                   [<<"1080b788c0e8115ce25ff93ed60cd4f1">>,

                    <<"16745cea9a733708f49fa44e1def4528">>]},

                  {<<"isBalanced">>,true}]}},

               {<<"error">>,null}]

So it appears the backup service doesn't keep track of which nodes in the cluster provide the backup service.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

For Gerrit Dashboard: MB-61244
#	Subject	Branch	Project	Status	CR	V
207532,1	MB-61244 Remove sleeps from leader test	trinity	cbbs	Status: NEW	+2	+1
207533,1	MB-61244 On startup read topology to give ns_server	trinity	cbbs	Status: NEW	+2	+1

Activity

People

Assignee:: Gilad Kalchheim

Reporter:: Steve Watanabe

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Mar/24 2:51 PM

Updated:: 22/Apr/24 4:17 AM

Gerrit Reviews

There are 2 open Gerrit changes

MB-61244 Remove sleeps from leader test

+2 Gerrit Review:
MB-61244 On startup read topology to give ns_server

+2 Gerrit Review:

[CBBS] Backup service requires unnecessary rebalance

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty