Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55006

[FTS] Rebalance button remains active [sometimes] even after supposedly balancing the cluster

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      • Take a 3 search node cluster.
      • Create a search index with 1 partition and 2 replicas
      • Remove one node in the cluster (via rebalance out)
      • Upon rebalance completion, the rebalance button remains active (expected because the search index's constraints cannot be met - 1 replica missing)
      • Now, add back the node into the cluster and rebalance
      • At this point we expect the rebalance button to NOT be active anymore because the search index's constraints are met, but logs indicate otherwise.

      p.s I haven't been able to reproduce this on cluster_run. But Jon Strabala has reproduced it on aws instances. Attached are the 3 search node logs.


      Rebalance starts:

      2023-01-05T20:18:00.136Z, ns_orchestrator:0:info:message(ns_1@10.0.0.23) - Starting rebalance, KeepNodes = ['ns_1@10.0.0.114','ns_1@10.0.0.144',
                                       'ns_1@10.0.0.145','ns_1@10.0.0.163',
                                       'ns_1@10.0.0.23','ns_1@10.0.0.251',
                                       'ns_1@10.0.0.30','ns_1@10.0.0.54',
                                       'ns_1@10.0.0.95'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 8278626a19cceac5319915bbc5fc62ed 

      After rebalance, ns_server determines 3 search nodes in the cluster ..

          {{service_map,fts},
           {['ns_1@10.0.0.163','ns_1@10.0.0.30','ns_1@10.0.0.54'],
            {<<"44b39b4e34f1baea1a88bc58b3a40000">>,2329}}} 

      ns_server checks with the search service on the system status. Now while search detects there's 3 nodes in the cluster, it complains on "could not meet replication constraints" which appears to be the problem.

      [json_rpc:debug,2023-01-05T20:18:07.042Z,ns_1@10.0.0.163:json_rpc_connection-fts-service_api<0.23444.8>:json_rpc_connection:handle_call:156]sending jsonrpc call:{[{jsonrpc,<<"2.0">>},
                             {id,107},
                             {method,<<"ServiceAPI.GetCurrentTopology">>},
                             {params,[{[{rev,<<"MjE=">>},{timeout,30000}]}]}]}
      [json_rpc:debug,2023-01-05T20:18:07.056Z,ns_1@10.0.0.163:json_rpc_connection-fts-service_api<0.23444.8>:json_rpc_connection:handle_info:89]got response: [{<<"id">>,107},
                     {<<"result">>,
                      {[{<<"rev">>,<<"MjI=">>},
                        {<<"nodes">>,
                         [<<"03544ee7c1e491540ba818097f01fca9">>,
                          <<"e636f4410dab1791c07ecce7aa6bb5a7">>,
                          <<"ea6ef01e396c22f9c796bca19d77dc1e">>]},
                        {<<"isBalanced">>,false},
                        {<<"messages">>,
                         [<<"warning: resource: \"ts02_fts_01\" -- could not meet replication constraints">>]}]}},
                     {<<"error">>,null}] 

       

      And it seems ONLY one of the 3 nodes is complaining ..

      10.0.0.30

      2023-01-05T20:18:01.186+00:00 [INFO] ctl/manager: GetCurrentTopology, haveTopologyRev: 13, changed, rv: &{Rev:[49 52] Nodes:[03544ee7c1e491540ba818097f01fca9 e636f4410dab1791c07ecce7aa6bb5a7 ea6ef01e396c22f9c796bca19d77dc1e] IsBalanced:true Messages:[]}
      2023-01-05T20:18:07.042+00:00 [INFO] ctl/manager: GetCurrentTopology, haveTopologyRev: 14, changed, rv: &{Rev:[49 53] Nodes:[e636f4410dab1791c07ecce7aa6bb5a7 ea6ef01e396c22f9c796bca19d77dc1e 03544ee7c1e491540ba818097f01fca9] IsBalanced:true Messages:[]}
      2023-01-05T20:18:07.056+00:00 [INFO] ctl/manager: GetCurrentTopology, haveTopologyRev: 15, changed, rv: &{Rev:[49 54] Nodes:[03544ee7c1e491540ba818097f01fca9 e636f4410dab1791c07ecce7aa6bb5a7 ea6ef01e396c22f9c796bca19d77dc1e] IsBalanced:true Messages:[]} 

      /mnt/datadisk/index/@fts:
      total 4
      1385168992 drwxrwx--- 4 couchbase couchbase 95 Jan  5 20:23 .
      1383071840 drwxrwx--- 4 couchbase couchbase 33 Jan  5 19:01 ..
      1385168993 -rw------- 1 couchbase couchbase 32 Jan  5 20:11 cbft.uuid
      1388314720 drwx------ 2 couchbase couchbase 73 Jan  5 20:18 planPIndexes
      1545601120 drwx------ 3 couchbase couchbase 86 Jan  5 20:15 ts02_fts_01_5e10ee0c69fc1b62_4c1c5584.pindex
      

       

      10.0.0.54

      2023-01-05T20:18:01.209+00:00 [INFO] ctl/manager: GetCurrentTopology, haveTopologyRev: , changed, rv: &{Rev:[52] Nodes:[03544ee7c1e491540ba818097f01fca9 e636f4410dab1791c07ecce7aa6bb5a7 ea6ef01e396c22f9c796bca19d77dc1e] IsBalanced:true Messages:[]}  

      /mnt/datadisk/index/@fts:
      total 4
      1460666464 drwxrwx--- 4 couchbase couchbase 95 Jan  5 20:23 .
      1383071840 drwxrwx--- 4 couchbase couchbase 33 Jan  5 19:27 ..
      1460666465 -rw------- 1 couchbase couchbase 32 Jan  5 20:18 cbft.uuid
      1461715040 drwx------ 2 couchbase couchbase 73 Jan  5 20:18 planPIndexes
      1654653024 drwx------ 3 couchbase couchbase 86 Jan  5 20:18 ts02_fts_01_5e10ee0c69fc1b62_4c1c5584.pindex 

       

      10.0.0.163

      2023-01-05T20:18:01.181+00:00 [INFO] ctl/manager: GetCurrentTopology, haveTopologyRev: 19, changed, rv: &{Rev:[50 48] Nodes:[03544ee7c1e491540ba818097f01fca9 e636f4410dab1791c07ecce7aa6bb5a7 ea6ef01e396c22f9c796bca19d77dc1e] IsBalanced:false Messages:[warning: resource: "ts02_fts_01" -- could not meet replication constraints]}  

      /mnt/datadisk/index/@fts:
      total 4
      1385168992 drwxrwx--- 4 couchbase couchbase 95 Jan  5 20:23 .
      1383071840 drwxrwx--- 4 couchbase couchbase 33 Jan  5 19:01 ..
      1385168993 -rw------- 1 couchbase couchbase 32 Jan  5 20:10 cbft.uuid
      1483735136 drwx------ 2 couchbase couchbase 73 Jan  5 20:18 planPIndexes
      1557135456 drwx------ 3 couchbase couchbase 86 Jan  5 20:15 ts02_fts_01_5e10ee0c69fc1b62_4c1c5584.pindex
       

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sarthak.dua Sarthak Dua
              abhinav Abhi Dangeti
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty