Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60282

[System Test] Rebalance failures - inactivity_timeout/ Collection does not exist/RestoreShard error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • Morpheus
    • 7.6.0
    • secondary-index
    • 7.6.0-1970
    • Untriaged
    • 0
    • Unknown

    Description

      Day 3 into the run, and there have been a few rebalance failures. The ones of interest are as follows -

      Rebalance 1

      [user:error,2024-01-04T05:12:06.433-08:00,ns_1@172.23.97.67:<0.11390.441>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.11005.1009>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"RestoreShard error :alternateId(13867861711156681046-3-1) already exists (shardId6469598071031833410)">>}}}}}.
      

      Rebalance 2

      [user:error,2024-01-04T19:47:03.584-08:00,ns_1@172.23.97.67:<0.3749.1088>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {{badmatch,
                                      {error,
                                       {bad_nodes,index,get_agent,
                                        [{'ns_1@172.23.97.108',
                                          {exit,
                                           {{timeout,
                                             {gen_server,call,
                                              [<34830.23512.1801>,
                                               {call,"ServiceAPI.CancelTask",
                                                #Fun<json_rpc_connection.0.36915653>,
                                                #{timeout => 60000}},
                                               60000]}},
                                            {gen_server,call,
                                             [{'service_agent-index',
                                               'ns_1@172.23.97.108'},
                                              get_agent,infinity]}}}},
                                         {'ns_1@172.23.97.109',
                                          {exit,
                                           {{timeout,
                                             {gen_server,call,
                                              [<34831.29071.1566>,
                                               {call,"ServiceAPI.CancelTask",
                                                #Fun<json_rpc_connection.0.36915653>,
                                                #{timeout => 60000}},
                                               60000]}},
                                            {gen_server,call,
                                             [{'service_agent-index',
                                               'ns_1@172.23.97.109'},
                                              get_agent,infinity]}}}}]}}},
                                     [{service_manager,wait_for_agents,1,
                                       [{file,"src/service_manager.erl"},
                                        {line,165}]},
                                      {service_manager,run_op,1,
                                       [{file,"src/service_manager.erl"},
                                        {line,140}]},
                                      {proc_lib,init_p,3,
                                       [{file,"proc_lib.erl"},{line,225}]}]}}.
      Rebalance Operation Id = ed0472d71a0728851d079456da8ec8dd
      

      Rebalance 3

      Please retry the operation at a later time.">>}}}}}).
      [user:error,2024-01-04T20:22:48.243-08:00,ns_1@172.23.97.67:<0.3749.1088>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.11459.1536>,
                                      {task_failed,rebalance,
                                       {service_error,
                                        <<"Collection does not exist or temporarily unavailable for creating new index.Bucket = bucket5 Scope = _default Collection = GNhuSkW2Zi. Please retry the operation at a later time.">>}}}}}.
      Rebalance Operation Id = ba77566605cdbf2830151efca3113705
      

      cc Varun Velamuri

      Latest logs ->

      Cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.105.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.106.171.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.106.176.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.106.30.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.96.198.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.96.230.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.96.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.97.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.97.109.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.97.66.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1704438199/collectinfo-2024-01-05T072606-ns_1%40172.23.97.67.zip

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-60282
          # Subject Branch Project Status CR V

          Activity

            People

              varun.velamuri Varun Velamuri
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty