Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
7.6.0-2119
-
Untriaged
-
0
-
Unknown
Description
The test does the following -
Create a 6-node cluster. ( 1 KV+ 5 GSI/Query)
Create a bunch of indexes of all types on both default and non-default collections.
Backup the cluster using cbbackupmgr.
Drop all the existing indexes and then rebalance out one node ( in the case of this test it's 172.23.108.100 around 2024-02-12 21:44:19,253).
Now create a few indexes and make sure the indexes reside on all the index nodes in the cluster ( This is done by setting 'indexer.planner.honourNodesInDefn' flag to True).
Restore using cbbackupmgr and then build indexes.
For one of the indexes (as seen from the metadata), there's a mismatch between replicaID and the replicaID in the alternate shard ID.
replica ID 0 |
{'defnId': 828005147595780687, 'instId': 3833718305992959152, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH { "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.117:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.117:8091': [6, 1, 7, 5, 3], '172.23.108.31:8091': [4, 2, 8]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 0, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.117:8091': {'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '3': ['15101210939926496508-0-0', '15101210939926496508-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1'], '6': ['15101210939926496508-0-0', '15101210939926496508-0-1'], '7': ['7586457320402690146-0-0', '7586457320402690146-0-1']}, '172.23.108.31:8091': {'2': ['8956814024077876499-0-0', '8956814024077876499-0-1'], '4': ['8956814024077876499-0-0', '8956814024077876499-0-1'], '8': ['17730230858741553132-0-0', '17730230858741553132-0-1']}}}, |
|
replica ID 1 |
{'defnId': 828005147595780687, 'instId': 15996086527567212228, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index (replica 1)', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH { "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.102:8091', '172.23.108.105:8091', '172.23.108.117:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.102:8091': [4, 2, 7], '172.23.108.105:8091': [3, 6], '172.23.108.117:8091': [8], '172.23.108.31:8091': [1, 5]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 1, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.102:8091': {'2': ['8956814024077876499-1-0', '8956814024077876499-1-1'], '4': ['8956814024077876499-1-0', '8956814024077876499-1-1'], '7': ['7586457320402690146-1-0', '7586457320402690146-1-1']}, '172.23.108.105:8091': {'3': ['15101210939926496508-1-0', '15101210939926496508-1-1'], '6': ['15101210939926496508-1-0', '15101210939926496508-1-1']}, '172.23.108.117:8091': {'8': ['17730230858741553132-1-0', '17730230858741553132-1-1']}, '172.23.108.31:8091': {'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1']}}}, |
|
|
replica ID 2 |
{'defnId': 828005147595780687, 'instId': 6777168063398122838, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index (replica 2)', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH { "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.102:8091', '172.23.108.105:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.102:8091': [6, 1, 3, 8], '172.23.108.105:8091': [2, 5, 4], '172.23.108.31:8091': [7]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 2, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.102:8091': {'1': ['6842439059863937162-2-0', '6842439059863937162-2-1'], '3': ['15101210939926496508-2-0', '15101210939926496508-2-1'], '6': ['15101210939926496508-2-0', '15101210939926496508-2-1'], '8': ['17730230858741553132-2-0', '17730230858741553132-2-1']}, '172.23.108.105:8091': {'2': ['8956814024077876499-2-0', '8956814024077876499-2-1'], '4': ['8956814024077876499-2-0', '8956814024077876499-2-1'], '5': ['9639163204147157397-2-0', '9639163204147157397-2-1']}, '172.23.108.31:8091': {'7': ['7586457320402690146-2-0', '7586457320402690146-2-1']}}}, |
For replica 1, the alternate shard ID for slot 6842439059863937162 and 9639163204147157397 is as follows -
{'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1']} |
This is not correct. The validation has failed around [2024-02-12 21:52:16,924] (This is Jenkins timestamp so the nodes themselves might be in a different timezone).
cbcollect ->
s3://cb-customers-secure/mismatchreplicaid/2024-02-13/test3-adcc8f8d1057a612.zip
Test log attached.
Attachments
For Gerrit Dashboard: MB-60771 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
205709,1 | MB-60771 Reset alternateShardIds in instance defn during restore | unstable | indexing | Status: MERGED | +2 | +1 |
205783,1 | Merging fixes for MB-60771 | master | indexing | Status: MERGED | +2 | +1 |