Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Enterprise Edition 7.6.0 build 1878
-
Untriaged
-
Linux x86_64
-
0
-
Unknown
-
Analytics Sprint 32
Description
Script to repro
./sequoia -client 172.23.110.181:2375 -provider file:debian_pine.yml -test tests/integration/7.6/test_7.6.yml -scope tests/integration/7.6/scope_7.6_magma.yml -scale 2 -repeat 0 -log_level 0 -version 7.6.0-1878 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=1209600 -show_topology=true
|
Saw multiple rebalance failures like below.
172.23.121.87 9:41:59 PM 2 Dec, 2023
Analytics Service unable to successfully rebalance bfb4defe8d86335dcce63c184d5d5a8f due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [5b6cd7a602641448fb013ef6ee363711], state: ACTIVE)'; see analytics_info.log for details
|
172.23.96.203 9:41:59 PM 2 Dec, 2023
Rebalance exited with reason {service_rebalance_failed,cbas,
|
{worker_died,
|
{'EXIT',<0.19887.1165>,
|
{task_failed,rebalance,
|
{service_error,
|
<<"Rebalance bfb4defe8d86335dcce63c184d5d5a8f failed: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.104.227:8091 (5b6cd7a602641448fb013ef6ee363711)], state: ACTIVE)">>}}}}}.
|
Rebalance Operation Id = a65aec09c53ec5a3841925d472342496
|
172.23.121.87 10:02:21 PM 2 Dec, 2023
Analytics Service unable to successfully rebalance affefd78be32cab3bc3b22dfa9f09cdc due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [5b6cd7a602641448fb013ef6ee363711], state: ACTIVE)'; see analytics_info.log for details
|
172.23.96.203 10:02:22 PM 2 Dec, 2023
Rebalance exited with reason {service_rebalance_failed,cbas,
|
{worker_died,
|
{'EXIT',<0.29729.1172>,
|
{task_failed,rebalance,
|
{service_error,
|
<<"Rebalance affefd78be32cab3bc3b22dfa9f09cdc failed: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.104.227:8091 (5b6cd7a602641448fb013ef6ee363711)], state: ACTIVE)">>}}}}}.
|
Rebalance Operation Id = 9f8a6651a09b277dbc351039ef7035f6
|
MB-59824 and MB-59802 looks similar but were fixed in 7.6.0-1857 and we are hitting this in 7.6.0-1878. Cbcollect_info attached.