Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46601

[System Test] Storage corruption detected during recovery

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-5229
      Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat_recovery.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_dgm.yml (GSI test with recovery)
      Scale : 2
      Iteration : 2nd

      On 172.23.107.3, just after indexer process restart at 2021-05-27T05:56:09, seeing the following in the indexer logs :

      2021-05-27T05:57:40.950-07:00 [Info] StorageMgr::openSnapshot IndexInst:4013790605517764671 Partition:1 Attempting to open snapshot (SnapshotInfo: count:175507 committed:false)
      2021-05-27T05:57:40.952-07:00 [Info] Indexer::initPartnInstance Initialized Partition:
               Index: 1064485940493745317 Partition: PartitionId: 2 Endpoints: [:9105]
      2021-05-27T05:57:40.952-07:00 [Error] plasmaSlice:NewplasmaSlice Id 0xfb35c0 IndexInstId 1064485940493745317 fatal error occured: Unable to initialize /data/@2i/bucket4_idx3_JWZCYGVFKJ_idxprefix_1064485940493745317_2.index/mainIndex, err = fatal: Fail to find shard for instance /data/@2i/bucket4_idx3_JWZCYGVFKJ_idxprefix_1064485940493745317_2.index/mainIndex due to corrupted or missing shards
      2021-05-27T05:57:40.952-07:00 [Error] plasmaSlice:NewplasmaSlice Id 0 IndexInstId 1064485940493745317 PartitionId 2 fatal error occured: Storage corrupted and unrecoverable
      2021-05-27T05:57:40.952-07:00 [Error] Indexer:: initPartnInstance storage corruption for indexInst
              InstId: 1064485940493745317
              Defn: DefnId: 7795214278730003651 Name: idx3_JWZCYGVFKJ_idxprefix Using: plasma Bucket: bucket4 Scope/Id: _default/0 Collection/Id: _default/0 IsPrimary: false NumReplica: 3 InstVersion: 0
                      SecExprs: <ud>([`free_breakfast` `free_parking` `country` `city`])</ud>
                      Desc: [false false false false]
                      PartitionScheme: KEY
                      HashScheme: CRC32 PartitionKeys: [(meta().`id`)] WhereExpr: <ud>()</ud> RetainDeletedXATTR: false
              State: INDEX_STATE_CREATED
              RState: RebalActive
              Stream: NIL_STREAM
              Version: 0
              ReplicaId: 1
              PartitionContainer: &{map[2:{2 0 [:9105]}] 1024 2 512 KEY 0} partnDefn {2 0 [:9105]}
      2021-05-27T05:57:40.956-07:00 [Info] Indexer::initFromPersistedState Starting cleanup for PartitionId: 2 Endpoints: [:9105]
      2021-05-27T05:57:40.956-07:00 [Info] Indexer::forceCleanupIndexPartition 1064485940493745317 2 mark metadata as deleted
      2021-05-27T05:57:40.956-07:00 [Info] ClustMgr:handleCleanupPartition
              Message: MsgCleanupPartition
              Type: 67
              Index defn Id: 7795214278730003651
              Index inst Id: 1064485940493745317
              Index partition Id: 2
              Index replica Id: 1
              Update Status Only: true
      2021-05-27T05:57:40.965-07:00 [Info] LifecycleMgr.DeleteOrPruneIndexInstance() : index defnId 7795214278730003651 instance id 1064485940493745317 real instance id 0 partitions [2]
      2021-05-27T05:57:40.965-07:00 [Info] LifecycleMgr.PruneIndexPartition() : index defnId 7795214278730003651 instance 1064485940493745317 partitions [2]
      2021-05-27T05:57:40.965-07:00 [Info] LifecycleMgr.DeleteIndexInstance() : index defnId 7795214278730003651 instance id 1064485940493745317
      2021-05-27T05:57:40.965-07:00 [Info] LifecycleMgr.DeleteIndexInstance() : there is only a single instance.  Delete index 7795214278730003651
      2021-05-27T05:57:40.968-07:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_CLEANUP_PARTITION elapsed 2.919252ms len(expediates) 4 len(incomings) 5 len(outgoings) 0 error <nil>
      2021-05-27T05:57:40.968-07:00 [Info] Indexer::forceCleanupIndexPartition Cleaning up data files for 1064485940493745317 2
      2021-05-27T05:57:40.968-07:00 [Info] Indexer::backupCorruptIndexDataFiles 1064485940493745317 2 take backup of corrupt data files
      2021-05-27T05:57:40.968-07:00 [Info] Indexer::forceCleanupIndexPartition 1064485940493745317 2 Cleanup partition in-memory data structure
      2021-05-27T05:57:40.968-07:00 [Info] Indexer::forceCleanupIndexPartition 1064485940493745317 2 actually delete metadata
      

      Similar to this, on 172.23.107.4 also experience the same.

      2021-05-27T05:15:36.006-07:00 [Info] StorageMgr::openSnapshot IndexInst:6181630441445818924 Partition:2 Attempting to open snapshot (SnapshotInfo: count:47291 committed:false)
      2021-05-27T05:15:36.009-07:00 [Info] Indexer::initPartnInstance Initialized Partition:
               Index: 9758733882117289405 Partition: PartitionId: 0 Endpoints: [:9105]
      2021-05-27T05:15:36.009-07:00 [Error] plasmaSlice:NewplasmaSlice Id 0xfb35c0 IndexInstId 9758733882117289405 fatal error occured: Unable to initialize /data/@2i/bucket4_idx2_S6UPNDSX5E_idxprefix_9758733882117289405_0.index/mainIndex, err = fatal: Fail to find shard for instance /data/@2i/bucket4_idx2_S6UPNDSX5E_idxprefix_9758733882117289405_0.index/mainIndex due to corrupted or missing shards
      2021-05-27T05:15:36.009-07:00 [Error] plasmaSlice:NewplasmaSlice Id 0 IndexInstId 9758733882117289405 PartitionId 0 fatal error occured: Storage corrupted and unrecoverable
      2021-05-27T05:15:36.009-07:00 [Error] Indexer:: initPartnInstance storage corruption for indexInst
              InstId: 9758733882117289405
              Defn: DefnId: 17244479063719464749 Name: idx2_S6UPNDSX5E_idxprefix Using: plasma Bucket: bucket4 Scope/Id: _default/0 Collection/Id: lTGWOUKN/196 IsPrimary: false NumReplica: 1 InstVersion: 0
                      SecExprs: <ud>([`free_breakfast` `type` `free_parking` array_count(`public_likes`) `price` `country`])</ud>
                      Desc: [false false false false false false]
                      PartitionScheme: SINGLE
                      HashScheme: CRC32 PartitionKeys: [] WhereExpr: <ud>()</ud> RetainDeletedXATTR: false
              State: INDEX_STATE_CREATED
              RState: RebalActive
              Stream: NIL_STREAM
              Version: 0
              ReplicaId: 0
              PartitionContainer: &{map[0:{0 0 [:9105]}] 1024 1 1024 SINGLE 0} partnDefn {0 0 [:9105]}
      2021-05-27T05:15:36.013-07:00 [Info] Indexer::initFromPersistedState Starting cleanup for PartitionId: 0 Endpoints: [:9105]
      2021-05-27T05:15:36.013-07:00 [Info] Indexer::forceCleanupIndexPartition 9758733882117289405 0 mark metadata as deleted
      2021-05-27T05:15:36.013-07:00 [Info] ClustMgr:handleCleanupPartition
              Message: MsgCleanupPartition
              Type: 67
              Index defn Id: 17244479063719464749
              Index inst Id: 9758733882117289405
              Index partition Id: 0
              Index replica Id: 0
              Update Status Only: true
      2021-05-27T05:15:36.013-07:00 [Info] LifecycleMgr.DeleteOrPruneIndexInstance() : index defnId 17244479063719464749 instance id 9758733882117289405 real instance id 0 partitions [0]
      2021-05-27T05:15:36.013-07:00 [Info] LifecycleMgr.PruneIndexPartition() : index defnId 17244479063719464749 instance 9758733882117289405 partitions [0]
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sanjit.chauhan Sanjit Chauhan (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty