Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60874

[System Test] Rebalance failure - shard copy aborted: shard metadata mismatch with snapshot metadata

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • Morpheus
    • 7.6.0
    • secondary-index
    • 7.6.0-2153
    • Untriaged
    • 0
    • Unknown

    Description

      There's been a rebalance failure with this error -

      [ns_server:error,2024-02-20T09:20:38.736-08:00,ns_1@172.23.97.67:service_manager-index<0.28419.201>:service_manager:run_op_worker:219]Worker terminated abnormally: {'EXIT',<0.28815.201>,
                                     {task_failed,rebalance,
                                      {service_error,
                                       <<"shard copy aborted: shard metadata mismatch with snapshot metadata, shardId :15790437419045023451">>}}}
      [ns_server:info,2024-02-20T09:20:38.738-08:00,ns_1@172.23.97.67:rebalance_agent<0.21767.0>:rebalance_agent:handle_down:290]Rebalancer process <0.27312.201> died (reason {service_rebalance_failed,
                                                     index,
                                                     {worker_died,
                                                      {'EXIT',<0.28815.201>,
                                                       {task_failed,rebalance,
                                                        {service_error,
                                                         <<"shard copy aborted: shard metadata mismatch with snapshot metadata, shardId :15790437419045023451">>}}}}}).
      

      Seen during iteration 12 on day 1. There was a test-induced indexer kill, but this appears to be after the rebalance failure -

      [pull] vijayviji/sshpass
      [2024-02-20T09:20:55-08:00, vijayviji/sshpass:793fa7] sshpass -p couchbase ssh -o StrictHostKeyChecking=no root@172.23.96.245 kill -SIGKILL $(pgrep memcached)
      [pull] sequoiatools/cmd
      

      So , I don't think the indexer kill has caused this specific rebalance failure. Let me know if you feel otherwise.

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708450049/collectinfo-2024-02-20T173605-ns_1%40172.23.97.67.zip

      Older logs (n-1)->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.105.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.108.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708445922/collectinfo-2024-02-20T162729-ns_1%40172.23.97.67.zip

      Older logs ( n-2)

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.105.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1708441548/collectinfo-2024-02-20T151843-ns_1%40172.23.97.67.zip

      cc Varun Velamuri

      P. S - A similar rebalance failure was seen before (https://issues.couchbase.com/browse/MB-59461 and https://issues.couchbase.com/browse/MB-59945). I'm not sure if something has caused this regression, so initial triaging would help. cc Ritam Sharma

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            shivansh.rustagi Shivansh Rustagi
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty