Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60338

Toggling the shard affinity flag and triggering rebalance seems to have created one extra partition

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      The test does the following -

      Create a 6-node cluster with 1 KV + 5 GSI/Query nodes.
      Create buckets/scopes/collections.
      Disable shard affinity flag ( enable_shard_affinity is set to False).
      Create indexes on default/non-default collections.
      Trigger a rebalance ( remove 2 nodes and trigger rebalance).
      Enable the shard affinity flag.
      Add the 2 nodes that were removed and trigger another rebalance.

      After rebalance, it looks like the partitioned indexes have all ended up with 9 partitions (These were created with the default 8 partitions).

      From the logs, some important timestamps -

      Shard affinity flag was reset at

      2024-01-11 14:12:34 | INFO | MainProcess | test_thread | [on_prem_rest_client.set_index_settings] {'indexer.settings.enable_shard_affinity': False} set
      

      Data load happened around this time

      2024-01-11 14:13:39
      

      Indexes were created around -

      2024-01-11 14:13:57

      First rebalance i.e rebalance out 2 nodes was triggered around this time -

      2024-01-11 14:19:01 | INFO | MainProcess | test_thread | [gsi_file_based_rebalance.rebalance_and_validate] Rebalance task triggered. Wait in loop until the rebalance starts
      2024-01-11 14:19:01 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.rebalance] rebalance params : {'knownNodes': 'ns_1@10.113.223.101,ns_1@10.113.223.102,ns_1@10.113.223.103,ns_1@10.113.223.104,ns_1@10.113.223.105,ns_1@10.113.223.106', 'ejectedNodes': 'ns_1@10.113.223.102,ns_1@10.113.223.103', 'user': 'Administrator', 'password': 'password'}
      

      This was successfully completed around -

      2024-01-11 14:21:37 | INFO | MainProcess | test_thread | [on_prem_rest_client.rebalance_reached] rebalance reached >100% in 143.30752682685852 seconds 
      

      Shard affinity flag was enabled at this time -

      2024-01-11 14:24:33 | INFO | MainProcess | test_thread | [on_prem_rest_client.set_index_settings] {'indexer.settings.enable_shard_affinity': True} set
      

      New nodes were added and rebalance was triggered around this time -

      2024-01-11 14:26:19 | INFO | MainProcess | test_thread | [gsi_file_based_rebalance.rebalance_and_validate] Rebalance task triggered. Wait in loop until the rebalance starts
      2024-01-11 14:26:19 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.113.223.102:8091 to cluster
      2024-01-11 14:26:19 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.add_node] adding remote node @10.113.223.102:18091 to this cluster @10.113.223.101:8091
      2024-01-11 14:26:29 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] rebalance progress took 10.05 seconds 
      2024-01-11 14:26:29 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2024-01-11 14:26:49 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.113.223.103:8091 to cluster
      2024-01-11 14:26:49 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.add_node] adding remote node @10.113.223.103:18091 to this cluster @10.113.223.101:8091
      2024-01-11 14:26:59 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] rebalance progress took 10.07 seconds 
      

      Rebalance was completed at -

      2024-01-11 14:28:12 | INFO | MainProcess | test_thread | [on_prem_rest_client.rebalance_reached] rebalance reached >100% in 45.82607698440552 seconds 
      2024-01-11 14:28:26 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: none, progress: 100.00%
      

      The validation for items count has failed at this time -

      2024-01-11 14:24:25 | INFO | MainProcess | test_thread | [tuq_helper._find_differences] Diffs {'values_changed': {"root['hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index']": {'new_value': 2643212, 'old_value': 2640947}, "root['hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index (replica 1)']": {'new_value': 2637412, 'old_value': 2626385}, "root['hotel937eafe8103f4438a73a1531d2084e06partitioned_index']": {'new_value': 2670010, 'old_value': 2678227}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index']": {'new_value': 2590289, 'old_value': 2583192}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index (replica 2)']": {'new_value': 2607539, 'old_value': 2582524}, "root['hotel012d83107e3a476aa5c43456831dfafdpartitioned_index']": {'new_value': 2625341, 'old_value': 2616588}, "root['hotel937eafe8103f4438a73a1531d2084e06partitioned_index (replica 1)']": {'new_value': 2704019, 'old_value': 2707582}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index (replica 1)']": {'new_value': 2614534, 'old_value': 2582989}, "root['hotel012d83107e3a476aa5c43456831dfafdpartitioned_index (replica 1)']": {'new_value': 2627764, 'old_value': 2619037}}}
      

      The partitioned indexes all seem to have 9 partitions and this could be the reason for itme count mismatch. The partitioned indexes in question -

      hotel937eafe8103f4438a73a1531d2084e06partitioned_index
      'hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index
      hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index (replica 1)
      hotel937eafe8103f4438a73a1531d2084e06partitioned_index (replica 1)
      

      cbcollect ->

      s3://cb-customers-secure/extrapartition/2024-01-11/archive.zip

      Attachments

        1. Archive.zip
          42.21 MB
        2. ToggleFlag.zip
          55.50 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pavan.pb Pavan PB
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty