Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60749

FTS rebalance failed during a rebalance out operation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 7.6.0
    • 7.6.0
    • fts
    • debian
    • Untriaged
    • 0
    • Unknown

    Description

      Issue observed on 7.6.0-2090

      Steps to repro

      1. Have a 8 node cluster with kv:n1ql-kv-kv-index:n1ql-index:n1ql-fts-fts-fts 
      2. Fts and indexing service ram should be 10k mb
      3. Create 4 buckets and load 1k to 2k into each one of them
      4. Create 50 gsi indexes in each of the buckets with defer build as true and num_replica as 1
      5. Build the above gsi indexes
      6. Create 10 fts indexes in each bucket with num_replica=1, num_partitions=1, index_type=scorch, text_analyzer=keyword
      7. Create 100 gsi indexes in each of the buckets with defer build as true and num_replica as 1 and then drop them
      8. Create 12 fts indexes in each bucket with num_replica=1, num_partitions=1, index_type=scorch, text_analyzer=keyword and then drop them
      9. Now rebalance out the first fts node in the cluster

      Rebalance failure -

      2024-02-09 04:43:24 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] Latest logs from UI on 172.23.123.131:
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1707482596357, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,fts,\n                              {agent_died,<27302.12262.0>,\n                               {linked_process_died,<27302.12332.0>,\n                                {\'ns_1@172.23.122.132\',\n                                 {{badmatch,\n                                   {false,\n                                    {topology,[],\n                                     [<<"4bf653ebc93af32ca7592c04ffa86d4d">>,\n                                      <<"4e730b3e98d67da6187b4ee8af1229d8">>,\n                                      <<"6520c388f6e44a26a8d59502fec7b3b3">>],\n                                     true,[]},\n                                    {topology,[],\n                                     [<<"4bf653ebc93af32ca7592c04ffa86d4d">>,\n                                      <<"4e730b3e98d67da6187b4ee8af1229d8">>,\n                                      <<"6520c388f6e44a26a8d59502fec7b3b3">>],\n                                     false,[]}}},\n                                  [{service_agent,long_poll_worker_loop,5,\n                                    [{file,"src/service_agent.erl"},\n                                     {line,750}]},\n                                   {proc_lib,init_p,3,\n                                    [{file,"proc_lib.erl"},{line,225}]}]}}}}}.\nRebalance Operation Id = ddc3b2a03ae139e1923f4bbe8a23648e', 'serverTime': '2024-02-09T04:43:16.357Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1707482561108, 'shortText': 'message', 'text': 'Bucket "standard_bucket3" rebalance does not seem to be swap rebalance', 'serverTime': '2024-02-09T04:42:41.108Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1707482560563, 'shortText': 'message', 'text': 'Started rebalancing bucket standard_bucket3', 'serverTime': '2024-02-09T04:42:40.563Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1707482551586, 'shortText': 'message', 'text': 'Bucket "standard_bucket2" rebalance does not seem to be swap rebalance', 'serverTime': '2024-02-09T04:42:31.586Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1707482551058, 'shortText': 'message', 'text': 'Started rebalancing bucket standard_bucket2', 'serverTime': '2024-02-09T04:42:31.058Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1707482543019, 'shortText': 'message', 'text': 'Bucket "standard_bucket1" rebalance does not seem to be swap rebalance', 'serverTime': '2024-02-09T04:42:23.019Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1707482542496, 'shortText': 'message', 'text': 'Started rebalancing bucket standard_bucket1', 'serverTime': '2024-02-09T04:42:22.496Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1707482534662, 'shortText': 'message', 'text': 'Bucket "standard_bucket0" rebalance does not seem to be swap rebalance', 'serverTime': '2024-02-09T04:42:14.662Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1707482534116, 'shortText': 'message', 'text': 'Started rebalancing bucket standard_bucket0', 'serverTime': '2024-02-09T04:42:14.116Z'}
      2024-02-09 04:43:24 | ERROR | MainProcess | Cluster_Thread | [on_prem_rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.157', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1707482533785, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.123.129','ns_1@172.23.123.131',\n                                 'ns_1@172.23.123.157','ns_1@172.23.123.160',\n                                 'ns_1@172.23.123.206','ns_1@172.23.123.207',\n                                 'ns_1@172.23.123.209'], EjectNodes = ['ns_1@172.23.122.132'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = ddc3b2a03ae139e1923f4bbe8a23648e", 'serverTime': '2024-02-09T04:42:13.785Z'} 

      Node being rebalanced out - 172.23.122.132
      Logs - 
      test_1 (9).zip
      Test logs - 
      test_log.log

      Attachments

        1. test_1 (9).zip
          76.10 MB
        2. test_log.log
          1.05 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              yash.dodderi Yash Dodderi
              yash.dodderi Yash Dodderi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty