Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.6.0
Affects Version/s: 7.6.0
Component/s: secondary-index
Labels:
- approved-for-trinity
Environment:
7.6.0-1980

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

The test does the following -

Create a 6-node cluster with 1 KV + 5 GSI/Query nodes.
Create buckets/scopes/collections.
Disable shard affinity flag ( enable_shard_affinity is set to False).
Create indexes on default/non-default collections.
Trigger a rebalance ( remove 2 nodes and trigger rebalance).
Enable the shard affinity flag.
Add the 2 nodes that were removed and trigger another rebalance.

After rebalance, it looks like the partitioned indexes have all ended up with 9 partitions (These were created with the default 8 partitions).

From the logs, some important timestamps -

Shard affinity flag was reset at

2024-01-11 14:12:34 | INFO | MainProcess | test_thread | [on_prem_rest_client.set_index_settings] {'indexer.settings.enable_shard_affinity': False} set

Data load happened around this time

2024-01-11 14:13:39

Indexes were created around -

2024-01-11 14:13:57

First rebalance i.e rebalance out 2 nodes was triggered around this time -

2024-01-11 14:19:01 | INFO | MainProcess | test_thread | [gsi_file_based_rebalance.rebalance_and_validate] Rebalance task triggered. Wait in loop until the rebalance starts

2024-01-11 14:19:01 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.rebalance] rebalance params : {'knownNodes': 'ns_1@10.113.223.101,ns_1@10.113.223.102,ns_1@10.113.223.103,ns_1@10.113.223.104,ns_1@10.113.223.105,ns_1@10.113.223.106', 'ejectedNodes': 'ns_1@10.113.223.102,ns_1@10.113.223.103', 'user': 'Administrator', 'password': 'password'}

This was successfully completed around -

2024-01-11 14:21:37 | INFO | MainProcess | test_thread | [on_prem_rest_client.rebalance_reached] rebalance reached >100% in 143.30752682685852 seconds

Shard affinity flag was enabled at this time -

2024-01-11 14:24:33 | INFO | MainProcess | test_thread | [on_prem_rest_client.set_index_settings] {'indexer.settings.enable_shard_affinity': True} set

New nodes were added and rebalance was triggered around this time -

2024-01-11 14:26:19 | INFO | MainProcess | test_thread | [gsi_file_based_rebalance.rebalance_and_validate] Rebalance task triggered. Wait in loop until the rebalance starts

2024-01-11 14:26:19 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.113.223.102:8091 to cluster

2024-01-11 14:26:19 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.add_node] adding remote node @10.113.223.102:18091 to this cluster @10.113.223.101:8091

2024-01-11 14:26:29 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] rebalance progress took 10.05 seconds

2024-01-11 14:26:29 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] sleep for 10 seconds after rebalance...

2024-01-11 14:26:49 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.113.223.103:8091 to cluster

2024-01-11 14:26:49 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.add_node] adding remote node @10.113.223.103:18091 to this cluster @10.113.223.101:8091

2024-01-11 14:26:59 | INFO | MainProcess | Cluster_Thread | [on_prem_rest_client.monitorRebalance] rebalance progress took 10.07 seconds

Rebalance was completed at -

2024-01-11 14:28:12 | INFO | MainProcess | test_thread | [on_prem_rest_client.rebalance_reached] rebalance reached >100% in 45.82607698440552 seconds

2024-01-11 14:28:26 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: none, progress: 100.00%

The validation for items count has failed at this time -

2024-01-11 14:24:25 | INFO | MainProcess | test_thread | [tuq_helper._find_differences] Diffs {'values_changed': {"root['hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index']": {'new_value': 2643212, 'old_value': 2640947}, "root['hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index (replica 1)']": {'new_value': 2637412, 'old_value': 2626385}, "root['hotel937eafe8103f4438a73a1531d2084e06partitioned_index']": {'new_value': 2670010, 'old_value': 2678227}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index']": {'new_value': 2590289, 'old_value': 2583192}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index (replica 2)']": {'new_value': 2607539, 'old_value': 2582524}, "root['hotel012d83107e3a476aa5c43456831dfafdpartitioned_index']": {'new_value': 2625341, 'old_value': 2616588}, "root['hotel937eafe8103f4438a73a1531d2084e06partitioned_index (replica 1)']": {'new_value': 2704019, 'old_value': 2707582}, "root['hotel5239c4ba9eef43f4acd94a30b1889be8partitioned_index (replica 1)']": {'new_value': 2614534, 'old_value': 2582989}, "root['hotel012d83107e3a476aa5c43456831dfafdpartitioned_index (replica 1)']": {'new_value': 2627764, 'old_value': 2619037}}}

The partitioned indexes all seem to have 9 partitions and this could be the reason for itme count mismatch. The partitioned indexes in question -

hotel937eafe8103f4438a73a1531d2084e06partitioned_index

'hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index

hotelfb83682d0fcc454b87a964c6a73f845cpartitioned_index (replica 1)

hotel937eafe8103f4438a73a1531d2084e06partitioned_index (replica 1)

cbcollect ->

s3://cb-customers-secure/extrapartition/2024-01-11/archive.zip

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Archive.zip
42.21 MB
16/Jan/24 12:10 AM
ToggleFlag.zip
55.50 MB
04/Feb/24 11:25 PM

Issue Links

is duplicated by

MB-60504 [System Test] Rebalance failed while upgrading indexer node

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Pavan PB

Reporter:: Pavan PB

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Due:: 09/Feb/24

Created:: 11/Jan/24 1:27 AM

Updated:: 12/Feb/24 12:56 AM

Resolved:: 07/Feb/24 10:56 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 4 closed Gerrit changes

Hide There are 4 closed Gerrit changes

MB-60338: Swap initialNode too when swapping index replicas: Gerrit Review:

Merging fixes for MB-60509,MB-60338,MB-60534,MB-60529: Gerrit Review:

MB-60338: Resubmitting - Swap initialNode too..: Gerrit Review:

Merging fixes for MB-60338, MB-60529, MB-60534, MB-60509 and MB-60580: Gerrit Review:

Toggling the shard affinity flag and triggering rebalance seems to have created one extra partition

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty