Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
7.6.0-2167
-
Untriaged
-
0
-
Yes
Description
There have been 2 rebalance failures with a similar reason -
Failure 1 -
[user:error,2024-02-27T10:19:34.598-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index, |
{worker_died,
|
{'EXIT',<0.11571.755>, |
{task_failed,rebalance,
|
{service_error,
|
<<"RestoreShard error :shard already exists :/data/@2i/shards/shard14695280024876267862">>}}}}}. |
Failure 2 -
[user:error,2024-02-27T11:10:45.371-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index, |
{worker_died,
|
{'EXIT',<0.32326.774>, |
{task_failed,rebalance,
|
{service_error,
|
<<"RestoreShard error :shard already exists :/data/@2i/shards/ |
Panic observed on 108 and 176.
cbcollect ->
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.108.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.109.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.67.zip
cbcollect n-1 ->
Cbcollect logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.67.zip
cbcollect n-2 ->
Cbcollect logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.171.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.108.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.67.zip
cbcollect n-3 ->
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.105.122.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.171.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.176.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.198.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.230.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.109.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.66.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.67.zip
Having had a discussion with Varun Velamuri, this does not look like https://issues.couchbase.com/browse/MB-60917.
We have not seen this failure from RC1 - RC6, and seeing this for first time in 2167.
I'll let Varun comment on if it's a regression after RCA, but since we have not seen this issue earlier, QE has marked this as regression.
cc Ritam Sharma
Attachments
For Gerrit Dashboard: MB-60962 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
206465,2 | Repro for MB-60962 | unstable | indexing | Status: ABANDONED | 0 | 0 |
206466,2 | MB-60962 Use initialNode from solution in filterSolution | unstable | indexing | Status: MERGED | +2 | +1 |
206476,1 | Merging fixes for MB-60962 | master | indexing | Status: MERGED | +2 | +1 |
206512,1 | Repro for RestoreAndUnlockShards issue in MB-60962 | unstable | indexing | Status: ABANDONED | 0 | 0 |
206528,3 | Repro for panics in MB-60962 | unstable | indexing | Status: ABANDONED | 0 | 0 |
206553,1 | MB-60962 GroupIndexes before filterSolution() if new alternate shardIds are required | unstable | indexing | Status: MERGED | +2 | +1 |
206593,1 | Merging another fix for MB-60962 | master | indexing | Status: MERGED | +2 | +1 |