Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.0.0
-
Untriaged
-
1
-
No
-
CX Sprint 262
Description
Encountered during system tests inĀ MB-48345. Node 172.23.104.21 halted due to a timeout waiting for ingestion to be suspended.
2021-09-06 01:09:08,827 - systestmon - INFO - --+--+--+--+-- Parsing logs for analytics component looking for fata --+--+--+--+--
|
2021-09-06 01:09:09,482 - systestmon - WARNING - *** 2 occurences of fata keyword found on 172.23.104.21 ***
|
2021-09-06 01:09:09,483 - systestmon - DEBUG - 2021-09-06T00:52:09.922-07:00 FATA CBAS.util.ExitUtil [Executor-75:ClusterController : WaitForCompletionForJobId: JID:0.1844] JVM halting with status 22 (halting thread Thread[Executor-75:ClusterController : WaitForCompletionForJobId: JID:0.1844,10,main], interrupted false)
|
2021-09-06 01:09:09,483 - systestmon - DEBUG - 2021-09-06T00:52:10.276-07:00 FATA CBAS.util.ExitUtil [pool-2-thread-1] Thread dump at halt:
|
2021-09-06 01:09:12,518 - systestmon - WARNING - There have been more occurences of keyword fata in the logs since the last iteration. Hence performing a cbcollect.
|
The issue can happen when some node in the cluster fails while we are trying to suspend ingestion due to a rebalance or a DDL. While the ingestion job actually fails, we keep waiting for the ingestion state to be "suspended" but the state will actually be "temporary failed" due to the node failure.