Details
-
Bug
-
Resolution: Unresolved
-
Major
-
7.6.0, 7.2.0
-
Untriaged
-
0
-
No
-
Tools 2024-Q1
Description
TLDR: the backup service doesn't appear to keep track of the nodes in the cluster providing the backup service. Thus after a cluster reboot a rebalance is required.
When a cluster is rebooted the eventing service returns just itself in "nodes" and "isBalanced" is true. Because the number of "nodes" doesn't match what ns_server has eventing configured for we require a rebalance. To reproduce:
- cluster_run -n 2 --dont-rename
- cluster_connect -n 2 -s 1024 -I 512 -M plasma -T n0:kv+index+n1ql+fts+eventing+cbas+backup,n1:kv+index+n1ql+fts+eventing+cbas+backup
- Log into UI and see that rebalance completes
- CTRL^C in the window where cluster_run was run
- cluster_run -n 2 --dont-rename
At this point the /pools/default endpoint returns that "eventing" and "backup" require a rebalance.
"balanced": false,
|
"servicesNeedRebalance": [
|
{
|
"code": "service_not_balanced",
|
"description": "Service needs rebalance.",
|
"services": [
|
"eventing",
|
"backup"
|
]
|
}
|
The reason ns_server believes "backup" needs a rebalance is due to GetTopology responses from each of the two nodes includes just that node and indicates isBalanced is true. Here's the entries for the two nodes on my run (note each node only knows about itself):
[json_rpc:debug,2024-03-20T14:46:03.158-07:00,n_0@127.0.0.1:json_rpc_connection-backup-service_api<0.1252.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,2},
|
{<<"result">>,
|
{[{<<"rev">>,<<"AAAAAAAAAAI=">>},
|
{<<"nodes">>,[<<"1080b788c0e8115ce25ff93ed60cd4f1">>]},
|
{<<"isBalanced">>,true}]}},
|
{<<"error">>,null}]
|
and the other node
[json_rpc:debug,2024-03-20T14:46:03.164-07:00,n_1@127.0.0.1:json_rpc_connection-backup-service_api<0.1349.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,2},
|
{<<"result">>,
|
{[{<<"rev">>,<<"AAAAAAAAAAI=">>},
|
{<<"nodes">>,[<<"16745cea9a733708f49fa44e1def4528">>]},
|
{<<"isBalanced">>,true}]}},
|
{<<"error">>,null}]
|
As an example of what would be expected...this is after doing a rebalance from the UI.
[json_rpc:debug,2024-03-20T14:48:36.190-07:00,n_0@127.0.0.1:json_rpc_connection-backup-service_api<0.1252.0>:json_rpc_connection:handle_info:107]got response: [{<<"id">>,31},
|
{<<"result">>,
|
{[{<<"rev">>,<<"AAAAAAAAAAc=">>},
|
{<<"nodes">>,
|
[<<"1080b788c0e8115ce25ff93ed60cd4f1">>,
|
<<"16745cea9a733708f49fa44e1def4528">>]},
|
{<<"isBalanced">>,true}]}},
|
{<<"error">>,null}]
|
So it appears the backup service doesn't keep track of which nodes in the cluster provide the backup service.
Attachments
Gerrit Reviews
For Gerrit Dashboard: MB-61244 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
207532,1 | MB-61244 Remove sleeps from leader test | trinity | cbbs | Status: NEW | +2 | +1 |
207533,1 | MB-61244 On startup read topology to give ns_server | trinity | cbbs | Status: NEW | +2 | +1 |