Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60771

Shard affinity validation fail - replica ID mismatch with that of alternate ShardID

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      The test does the following -

      Create a 6-node cluster. ( 1 KV+ 5 GSI/Query)
      Create a bunch of indexes of all types on both default and non-default collections.
      Backup the cluster using cbbackupmgr.
      Drop all the existing indexes and then rebalance out one node ( in the case of this test it's 172.23.108.100 around 2024-02-12 21:44:19,253).
      Now create a few indexes and make sure the indexes reside on all the index nodes in the cluster ( This is done by setting 'indexer.planner.honourNodesInDefn' flag to True).
      Restore using cbbackupmgr and then build indexes.
      For one of the indexes (as seen from the metadata), there's a mismatch between replicaID and the replicaID in the alternate shard ID.

      replica ID 0
      {'defnId': 828005147595780687, 'instId': 3833718305992959152, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH {  "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.117:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.117:8091': [6, 1, 7, 5, 3], '172.23.108.31:8091': [4, 2, 8]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 0, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.117:8091': {'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '3': ['15101210939926496508-0-0', '15101210939926496508-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1'], '6': ['15101210939926496508-0-0', '15101210939926496508-0-1'], '7': ['7586457320402690146-0-0', '7586457320402690146-0-1']}, '172.23.108.31:8091': {'2': ['8956814024077876499-0-0', '8956814024077876499-0-1'], '4': ['8956814024077876499-0-0', '8956814024077876499-0-1'], '8': ['17730230858741553132-0-0', '17730230858741553132-0-1']}}}, 
       
      replica ID 1
      {'defnId': 828005147595780687, 'instId': 15996086527567212228, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index (replica 1)', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH {  "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.102:8091', '172.23.108.105:8091', '172.23.108.117:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.102:8091': [4, 2, 7], '172.23.108.105:8091': [3, 6], '172.23.108.117:8091': [8], '172.23.108.31:8091': [1, 5]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 1, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.102:8091': {'2': ['8956814024077876499-1-0', '8956814024077876499-1-1'], '4': ['8956814024077876499-1-0', '8956814024077876499-1-1'], '7': ['7586457320402690146-1-0', '7586457320402690146-1-1']}, '172.23.108.105:8091': {'3': ['15101210939926496508-1-0', '15101210939926496508-1-1'], '6': ['15101210939926496508-1-0', '15101210939926496508-1-1']}, '172.23.108.117:8091': {'8': ['17730230858741553132-1-0', '17730230858741553132-1-1']}, '172.23.108.31:8091': {'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1']}}}, 
       
       
      replica ID 2 
      {'defnId': 828005147595780687, 'instId': 6777168063398122838, 'name': 'hotel74d0378c601442bf87c9635516952245partitioned_index (replica 2)', 'bucket': 'test_bucket', 'scope': 'test_scope_1', 'collection': 'test_collection_1', 'secExprs': ['`name`'], 'indexType': 'plasma', 'status': 'Ready', 'definition': 'CREATE INDEX `hotel74d0378c601442bf87c9635516952245partitioned_index` ON `test_bucket`.`test_scope_1`.`test_collection_1`(`name`) PARTITION BY hash(`name`) WITH {  "defer_build":true, "nodes":[ "172.23.108.102:8091","172.23.108.105:8091","172.23.108.117:8091","172.23.108.31:8091" ], "num_replica":2, "num_partition":8 }', 'hosts': ['172.23.108.102:8091', '172.23.108.105:8091', '172.23.108.31:8091'], 'completion': 100, 'progress': 100, 'scheduled': False, 'partitioned': True, 'numPartition': 8, 'partitionMap': {'172.23.108.102:8091': [6, 1, 3, 8], '172.23.108.105:8091': [2, 5, 4], '172.23.108.31:8091': [7]}, 'numReplica': 2, 'indexName': 'hotel74d0378c601442bf87c9635516952245partitioned_index', 'replicaId': 2, 'stale': False, 'lastScanTime': 'NA', 'alternateShardIds': {'172.23.108.102:8091': {'1': ['6842439059863937162-2-0', '6842439059863937162-2-1'], '3': ['15101210939926496508-2-0', '15101210939926496508-2-1'], '6': ['15101210939926496508-2-0', '15101210939926496508-2-1'], '8': ['17730230858741553132-2-0', '17730230858741553132-2-1']}, '172.23.108.105:8091': {'2': ['8956814024077876499-2-0', '8956814024077876499-2-1'], '4': ['8956814024077876499-2-0', '8956814024077876499-2-1'], '5': ['9639163204147157397-2-0', '9639163204147157397-2-1']}, '172.23.108.31:8091': {'7': ['7586457320402690146-2-0', '7586457320402690146-2-1']}}}, 
      

      For replica 1, the alternate shard ID for slot 6842439059863937162 and 9639163204147157397 is as follows -

      {'1': ['6842439059863937162-0-0', '6842439059863937162-0-1'], '5': ['9639163204147157397-0-0', '9639163204147157397-0-1']}
      

      This is not correct. The validation has failed around [2024-02-12 21:52:16,924] (This is Jenkins timestamp so the nodes themselves might be in a different timezone).

      cbcollect ->

      s3://cb-customers-secure/mismatchreplicaid/2024-02-13/test3-adcc8f8d1057a612.zip

      Test log attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty