Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.2.1
-
7.2.1-5819 on GCP
-
Untriaged
-
0
-
Unknown
Description
A 5 node cluster with the following config -
3 KV + 2 GSI/Query ( n2-standard-8 200 GB disk + n2-standard-8 450 GB disk)
During the system test run it was seen that the index/query nodes keep getting auto failed over and added back over and over. This is the message seen in diag.log
Node ('ns_1@svc-qi-node-005.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com') was automatically failed over. Reason: The index service took too long to respond to the health check |
The auto-failover interval is set to 10 seconds for cloud instances, but I'm not really sure why the nodes are auto-failed over so often ( 64 instances of auto-failover message seen during the system test run of about 20 hours so far).
cbcollect ->
https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-001.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip |
https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-002.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip |
https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-003.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip |
https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-qi-node-004.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip |
https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-qi-node-005.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip |
Not really sure if this is in anyway related to a different issue that was seen during the same run.