Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
7.6.0
-
7.6.0-2153
-
Untriaged
-
0
-
Unknown
Description
There's been a rebalance failure with this error -
[ns_server:error,2024-02-20T09:20:38.736-08:00,ns_1@172.23.97.67:service_manager-index<0.28419.201>:service_manager:run_op_worker:219]Worker terminated abnormally: {'EXIT',<0.28815.201>, |
{task_failed,rebalance,
|
{service_error,
|
<<"shard copy aborted: shard metadata mismatch with snapshot metadata, shardId :15790437419045023451">>}}} |
[ns_server:info,2024-02-20T09:20:38.738-08:00,ns_1@172.23.97.67:rebalance_agent<0.21767.0>:rebalance_agent:handle_down:290]Rebalancer process <0.27312.201> died (reason {service_rebalance_failed, |
index,
|
{worker_died,
|
{'EXIT',<0.28815.201>, |
{task_failed,rebalance,
|
{service_error,
|
<<"shard copy aborted: shard metadata mismatch with snapshot metadata, shardId :15790437419045023451">>}}}}}). |
Seen during iteration 12 on day 1. There was a test-induced indexer kill, but this appears to be after the rebalance failure -
[pull] vijayviji/sshpass
|
[2024-02-20T09:20:55-08:00, vijayviji/sshpass:793fa7] sshpass -p couchbase ssh -o StrictHostKeyChecking=no root@172.23.96.245 kill -SIGKILL $(pgrep memcached) |
[pull] sequoiatools/cmd
|
So , I don't think the indexer kill has caused this specific rebalance failure. Let me know if you feel otherwise.
Cbcollect logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.67.zip
Older logs (n-1)->
Cbcollect logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.105.122.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.171.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.108.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.67.zip
Older logs ( n-2)
Cbcollect logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.105.122.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.171.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.109.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.67.zip
P. S - A similar rebalance failure was seen before (https://issues.couchbase.com/browse/MB-59461 and https://issues.couchbase.com/browse/MB-59945). I'm not sure if something has caused this regression, so initial triaging would help. cc Ritam Sharma