Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60962

[System Test] Rebalance failure - RestoreShard error :shard already exists followed by indexer panic

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Yes

    Description

      There have been 2 rebalance failures with a similar reason -

      Failure 1 -

      [user:error,2024-02-27T10:19:34.598-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.11571.755>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"RestoreShard error :shard already exists :/data/@2i/shards/shard14695280024876267862">>}}}}}.
      

      Failure 2 -

      [user:error,2024-02-27T11:10:45.371-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.32326.774>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"RestoreShard error :shard already exists :/data/@2i/shards/
      

      Panic observed on 108 and 176.

      cbcollect ->

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.108.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.67.zip

      cbcollect n-1 ->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.67.zip

      cbcollect n-2 ->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.108.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.67.zip

      cbcollect n-3 ->

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.105.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.67.zip

      Having had a discussion with Varun Velamuri, this does not look like https://issues.couchbase.com/browse/MB-60917.
      We have not seen this failure from RC1 - RC6, and seeing this for first time in 2167.
      I'll let Varun comment on if it's a regression after RCA, but since we have not seen this issue earlier, QE has marked this as regression.

      cc Ritam Sharma

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty