Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-57947

Search Rebalance exited - 503 Service Unavailable

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • 7.6.0, 7.2.4
    • 7.2.1
    • fts

    Description

       

      Version: couchbase-cloud-server-7.2.1-5819-v1.0.20

      Test Scenario : Test code


      Steps involved in the test:

      1. 15 scopes and 60 collections across 3 buckets
        • Each bucket -> 10M data 1 index on each collection -> 60 indexes
      2. Sleep - 15 mins
      3. Run FTS Queries FTS Flex Queries Random fashion for 2 hrs
      4. Kill CBFT
      5. sleep for 15 mins
      6. Scale out to 5 nodes
      7. Create 10 more fts indexes with following config :
        • 1 Index | 1 Replica | 5 Partitions
        •  3 Indexes | 2 Replicas | 6 Partitions
        • 6 Indexes | 0 Replicas | 4 Partitions
      8. Sleep for 15 mins
      9. Run FTS Queries FTS Flex Queries Random fashion for 30 mins again
      10.  Kill CBFT
      11. Sleep for 15 mins
      12. Rebalance/Scale in to 4 nodes

      During this we see rebalance exiting for fts service with the following error:

      Rebalance exited with reason {service_rebalance_failed,fts,
      {agent_died,<33557.25272.84>,
      {linked_process_died,<33557.11877.86>,
      {'ns_1@svc-dqs-node-002.wlvswbdogdobi7s6.nonprod-project-avengers.com',
      {{badmatch,
      {false,
      {topology,[],
      [<<"7f65b5ad837d8fe4092435313490197b">>,
      <<"8a57b22dcc686d4dcfc14195b7860a00">>,
      <<"925ab73628a041b289606258b7e757c3">>,
      <<"a9eb96044f9eb78083e22a226e9ca2c9">>],
      false,
      [<<"error: nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Wed, 19 Jul 2023 15:54:37 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc1881c8780), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc201f54600), TLS:(*tls.ConnectionState)(0xc1ceed7600)}, urlUUID: monitor.UrlUUID{Url:\"https://svc-dqs-node-003.wlvswbdogdobi7s6.nonprod-project-avengers.com:18094\", UUID:\"925ab73628a041b289606258b7e757c3\"}, kind: /api/stats?partitions=true, err: <nil>">>]},
      {topology,[],
      [<<"7f65b5ad837d8fe4092435313490197b">>,
      <<"8a57b22dcc686d4dcfc14195b7860a00">>,
      <<"925ab73628a041b289606258b7e757c3">>,
      <<"a9eb96044f9eb78083e22a226e9ca2c9">>],
      true,[]}}},
      [{service_agent,long_poll_worker_loop,5,
      [{file,"src/service_agent.erl"},
      {line,605}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},{line,211}]}]}}}}}.
      Rebalance Operation Id = e221201a218da0abe3d32a31ba87fa31 


      FYI : The disk and CPU utilisation for all nodes seem to be in control and healthy.

       

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-57947
          # Subject Branch Project Status CR V

          Activity

            People

              likith.b Likith B
              sarthak.dua Sarthak Dua
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty