Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.2.0
-
Enterprise Edition 7.2.0 build 5232
-
Untriaged
-
0
-
Unknown
Description
Build :7.2.0-5232
Test : -test tests/2i/neo/test_neo_idx_clusterops_recovery.yml -scope tests/2i/neo/scope_neo_plasma_idx_dgm.yml
Scale : 3
It looks like an auto-failover of a node was attempted (not really sure why), but didn't go through because of safety check failure.
The problematic node appears to be 172.23.97.109 -
/opt/couchbase/var/lib/couchbase/logs/info.log.1.gz:[user:info,2023-03-07T08:25:00.948-08:00,ns_1@172.23.96.198:<0.27073.0>:auto_failover:log_unsafe_node:670]Could not automatically fail over node ('ns_1@172.23.97.109') due to operation being unsafe for service index. Safety check failed. |
/opt/couchbase/var/lib/couchbase/logs/info.log.1.gz:[user:info,2023-03-07T08:25:08.964-08:00,ns_1@172.23.96.198:<0.27073.0>:auto_failover:log_unsafe_node:670]Could not automatically fail over node ('ns_1@172.23.97.109') due to operation being unsafe for service index. Safety check failed. |
/opt/couchbase/var/lib/couchbase/logs/info.log.1.gz:[ns_server:info,2023-03-07T08:25:08.965-08:00,ns_1@172.23.96.198:ns_log<0.25245.0>:ns_log:is_duplicate_log:156]suppressing duplicate log auto_failover:0([<<"Could not automatically fail over node ('ns_1@172.23.97.109') due to operation being unsafe for service index. Safety check failed.">>]) because it's been seen 1 times in the past 8.016095 secs (last seen 8.016095 secs ago |
/opt/couchbase/var/lib/couchbase/logs/info.log.1.gz:[user:info,2023-03-07T08:25:15.977-08:00,ns_1@172.23.96.198:<0.27073.0>:auto_failover:log_unsafe_node:670]Could not automatically fail over node ('ns_1@172.23.97.109') due to operation being unsafe for service index. Safety check failed. |
/opt/couchbase/var/lib/couchbase/logs/info.log.1.gz:[ns_server:info,2023-03-07T08:25:15.977-08:00,ns_1@172.23.96.198:ns_log<0.25245.0>:ns_log:is_duplicate_log:156]suppressing duplicate log auto_failover:0([<<"Could not automatically fail over node ('ns_1@172.23.97.109') due to operation being unsafe for service index. Safety check failed.">>]) because it's been seen 2 times in the past 15.028912 secs (last seen 7.012817 secs ago |
The info.log on 172.23.97.109 shows these errors -
[ns_server:error,2023-03-07T08:24:46.533-08:00,ns_1@172.23.97.109:service_agent-index<0.30705.77>:service_agent:terminate:259]Terminating abnormally |
[ns_server:error,2023-03-07T08:24:53.409-08:00,ns_1@172.23.97.109:service_status_keeper_worker<0.13783.0>:rest_utils:get_json:62]Request to (indexer) getIndexStatus with headers [{"If-None-Match", |
"61fe7b1db8796333"}] failed: {error, |
timeout}
|
[ns_server:error,2023-03-07T08:24:53.410-08:00,ns_1@172.23.97.109:service_status_keeper-index<0.13786.0>:service_status_keeper:handle_cast:103]Service service_index returned incorrect status |
[ns_server:error,2023-03-07T08:25:08.413-08:00,ns_1@172.23.97.109:service_status_keeper_worker<0.13783.0>:rest_utils:get_json:62]Request to (indexer) getIndexStatus with headers [{"If-None-Match", |
"61fe7b1db8796333"}] failed: {error, |
timeout}
|
[ns_server:error,2023-03-07T08:25:08.414-08:00,ns_1@172.23.97.109:service_status_keeper-index<0.13786.0>:service_status_keeper:handle_cast:103]Service service_index returned incorrect status |
[user:info,2023-03-07T08:25:21.336-08:00,ns_1@172.23.97.109:<0.5258.78>:menelaus_web_alerts_srv:global_alert:178]Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.97.109" is 7%, which is under the threshold of 10%. |
[ns_server:info,2023-03-07T08:25:23.328-08:00,ns_1@172.23.97.109:ns_config_rep<0.13579.0>:ns_config_rep:pull_one_node:421]Pulling config from: 'ns_1@172.23.97.66' |
[ns_server:error,2023-03-07T08:25:23.417-08:00,ns_1@172.23.97.109:service_status_keeper_worker<0.13783.0>:rest_utils:get_json:62]Request to (indexer) getIndexStatus with headers [{"If-None-Match", |
"61fe7b1db8796333"}] failed: {error, |
timeout}
|
[ns_server:error,2023-03-07T08:25:23.418-08:00,ns_1@172.23.97.109:service_status_keeper-index<0.13786.0>:service_status_keeper:handle_cast:103]Service service_index returned incorrect status |
cbcollect ->
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.105.122.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.106.171.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.106.176.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.106.30.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.96.198.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.96.230.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.96.245.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.97.100.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.97.108.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.97.109.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.97.66.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1678209145/collectinfo-2023-03-07T171226-ns_1%40172.23.97.67.zip |
Attachments
For Gerrit Dashboard: MB-55879 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
189417,1 | MB-55879 Change default config for minVbQueueLength | neo | indexing | Status: MERGED | +2 | +1 |