Details
-
Bug
-
Resolution: Fixed
-
Major
-
Cheshire-Cat
-
7.0.0-4342
-
Untriaged
-
1
-
Yes
Description
Description:
Observed a rebalance failure in a backup restore test involving the fts service where 'a worker is terminated abnormally'.
Steps to reproduce:
(I'm not 100% sure on the steps here as it's part of a test with lots of tests preceding it)
There are two nodes involved in this particular test:
(I'm guessing these are 172.23.123.117 and 172.23.123.109 based on the logs).
From the test code I can gather that:
1. Node 2 is added to Node 1 specifying kv and fts as the services that should run on Node 2.
2. A rebalance happens.
What happens:
The rebalance fails with the following message present in the logs:
(From 172.23.123.117)
cbcollect_info_ns_1@172.23.123.117_20210202-115415/ns_server.error.log |
[ns_server:error,2021-02-02T03:47:33.824-08:00,ns_1@172.23.123.117:service_rebalancer-fts<0.31138.8>:service_rebalancer:run_rebalance_worker:125]Worker terminated abnormally: {'EXIT',<0.31152.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}
|
[user:error,2021-02-02T03:47:33.834-08:00,ns_1@172.23.123.117:<0.22240.8>:ns_orchestrator:log_rebalance_completion:1402]Rebalance exited with reason {service_rebalance_failed,fts,
|
{worker_died,
|
{'EXIT',<0.31152.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}}}.
|
Rebalance Operation Id = 39e9f04255cf50ee97eed090d380830c
|
[ns_server:error,2021-02-02T03:47:43.921-08:00,ns_1@172.23.123.117:service_rebalancer-fts<0.31827.8>:service_rebalancer:run_rebalance_worker:125]Worker terminated abnormally: {'EXIT',<0.31841.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}
|
[user:error,2021-02-02T03:47:43.922-08:00,ns_1@172.23.123.117:<0.22240.8>:ns_orchestrator:log_rebalance_completion:1402]Rebalance exited with reason {service_rebalance_failed,fts,
|
{worker_died,
|
{'EXIT',<0.31841.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}}}.
|
Rebalance Operation Id = 821db8b83b00decae950c50a8a5b5933
|
(From 172.23.123.109)
cbcollect_info_ns_1@172.23.123.109_20210202-115832/ns_server.error.log |
[ns_server:error,2021-02-02T03:47:33.826-08:00,ns_1@172.23.123.109:service_agent-fts<0.27302.7>:service_agent:handle_info:287]Rebalancer <27513.31138.8> died unexpectedly: {worker_died,
|
{'EXIT',<27513.31152.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}}
|
[ns_server:error,2021-02-02T03:47:43.922-08:00,ns_1@172.23.123.109:service_agent-fts<0.27302.7>:service_agent:handle_info:287]Rebalancer <27513.31827.8> died unexpectedly: {worker_died,
|
{'EXIT',<27513.31841.8>,
|
{rebalance_failed,
|
{service_error,
|
<<"planner: indexDefs.ImplVersion: > version: 5.5.0">>}}}}
|
What I expected to happen:
I expected the rebalance to succeed.
The logs:
The rebalance errors were only present in these two sets of logs:
172.23.123.109-20210202-0358-diag.zip
172.23.123.117-20210202-0354-diag.zip
I've also uploaded the logs for the other nodes which I presume were not involved in the test as they contain no 'rebalance failed' error messages.
Perhaps the following also might be of interest as these are the first two nodes specified in the ini file.
172.23.123.105-20210202-0347-diag.zip
172.23.123.116-20210202-0351-diag.zip