Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60962

[System Test] Rebalance failure - RestoreShard error :shard already exists followed by indexer panic

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Yes

    Description

      There have been 2 rebalance failures with a similar reason -

      Failure 1 -

      [user:error,2024-02-27T10:19:34.598-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.11571.755>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"RestoreShard error :shard already exists :/data/@2i/shards/shard14695280024876267862">>}}}}}.
      

      Failure 2 -

      [user:error,2024-02-27T11:10:45.371-08:00,ns_1@172.23.97.67:<0.22535.331>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.32326.774>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"RestoreShard error :shard already exists :/data/@2i/shards/
      

      Panic observed on 108 and 176.

      cbcollect ->

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.108.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709061302/collectinfo-2024-02-27T192429-ns_1%40172.23.97.67.zip

      cbcollect n-1 ->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709057345/collectinfo-2024-02-27T181502-ns_1%40172.23.97.67.zip

      cbcollect n-2 ->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.108.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709052981/collectinfo-2024-02-27T170905-ns_1%40172.23.97.67.zip

      cbcollect n-3 ->

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.105.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1709048549/collectinfo-2024-02-27T155622-ns_1%40172.23.97.67.zip

      Having had a discussion with Varun Velamuri, this does not look like https://issues.couchbase.com/browse/MB-60917.
      We have not seen this failure from RC1 - RC6, and seeing this for first time in 2167.
      I'll let Varun comment on if it's a regression after RCA, but since we have not seen this issue earlier, QE has marked this as regression.

      cc Ritam Sharma

      Attachments

        For Gerrit Dashboard: MB-60962
        # Subject Branch Project Status CR V

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty