Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51025

[System Test] Seeing "Storage corrupted and unrecoverable" error for one index partition

    XMLWordPrintable

Details

    Description

      Build : 7.1.0-2322
      Test : -test tests/integration/neo/test_neo_magma_milestone4.yml -scope tests/integration/neo/scope_neo_magma.yml
      Scale : 3
      Iteration : 1

      Seeing the following errors in the indexer.log of 172.23.105.39 :

      2022-02-16T19:15:01.847-08:00 [Error] plasmaSlice:NewplasmaSlice Id 0x10dd480 IndexInstId 3608192636986289659 fatal error occured: Unable to initialize /data/couchbase/@2i/bucket7_idx8_bSKtY_3608192636986289659_1.index/mainIndex, err = fatal: Fail to find shard for shared instance /data/couchbase/@2i/bucket7_idx8_bSKtY_3608192636986289659_1.index/mainIndex due to instance not present on disk
      2022-02-16T19:15:01.847-08:00 [Error] plasmaSlice:NewplasmaSlice Id 0 IndexInstId 3608192636986289659 PartitionId 1 fatal error occured: Storage corrupted and unrecoverable
      2022-02-16T19:15:01.851-08:00 [Error] Indexer:: initPartnInstance storage corruption for indexInst
              InstId: 8497178209169062924
              Defn: DefnId: 2074639484566203915 Name: idx8_bSKtY Using: plasma Bucket: bucket7 Scope/Id: scope_2/a Collection/Id: coll_4/16 IsPrimary: false NumReplica: 2 InstVersion: 0
                      SecExprs: <ud>([(distinct (array flatten_keys((`r`.`author`), ((`r`.`ratings`).`Cleanliness`)) for `r` in `reviews` when (((`r`.`ratings`).`Cleanliness`) < 4) end)) `country` `email` `free_parking`])</ud>
                      Desc: [false false false false false]
                      PartitionScheme: KEY
                      HashScheme: CRC32 PartitionKeys: [(meta().`id`)] WhereExpr: <ud>()</ud> RetainDeletedXATTR: false
              State: INDEX_STATE_ACTIVE
              RState: RebalPending
              Stream: MAINT_STREAM
              Version: 2
              ReplicaId: 1
              RealInstId: 3608192636986289659
              PartitionContainer: &{map[1:{1 2 [:9105]}] 1024 5 204 KEY 0} partnDefn {1 2 [:9105]}
      2022-02-16T19:15:01.853-08:00 [Info] Indexer::initFromPersistedState Starting cleanup for PartitionId: 1 Endpoints: [:9105]
      2022-02-16T19:15:01.854-08:00 [Info] Indexer::forceCleanupIndexPartition 8497178209169062924 1 mark metadata as deleted
      2022-02-16T19:15:01.854-08:00 [Info] ClustMgr:handleCleanupPartition
              Message: MsgCleanupPartition
              Type: 67
              Index defn Id: 2074639484566203915
              Index inst Id: 8497178209169062924
              Index partition Id: 1
              Index replica Id: 1
              Update Status Only: true
      2022-02-16T19:15:01.856-08:00 [Info] LifecycleMgr.DeleteOrPruneIndexInstance() : index defnId 2074639484566203915 instance id 8497178209169062924 real instance id 0 partitions [1]
      2022-02-16T19:15:01.856-08:00 [Info] LifecycleMgr.PruneIndexPartition() : index defnId 2074639484566203915 instance 8497178209169062924 partitions [1]
      2022-02-16T19:15:01.856-08:00 [Info] LifecycleMgr.DeleteIndexInstance() : index defnId 2074639484566203915 instance id 8497178209169062924
      2022-02-16T19:15:01.866-08:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_CLEANUP_PARTITION elapsed 10.003335ms len(expediates) 0 len(incomings) 0 len(outgoings) 0 len(parallels) 0 error <nil>
      2022-02-16T19:15:01.866-08:00 [Info] Indexer::forceCleanupIndexPartition Cleaning up data files for 8497178209169062924 1
      Destory instances matching prefix /data/couchbase/@2i/bucket7_idx8_bSKtY_3608192636986289659_1.index in /data/couchbase/@2i ...
      No instances to destroy matching prefix /data/couchbase/@2i/bucket7_idx8_bSKtY_3608192636986289659_1.index
      2022-02-16T19:15:01.866-08:00 [Info] Indexer::forceCleanupIndexPartition 8497178209169062924 1 Cleanup partition in-memory data structure
      2022-02-16T19:15:01.867-08:00 [Info] Indexer::forceCleanupIndexPartition 8497178209169062924 1 actually delete metadata
      2022-02-16T19:15:01.867-08:00 [Info] ClustMgr:handleCleanupPartition
              Message: MsgCleanupPartition
              Type: 67
              Index defn Id: 2074639484566203915
              Index inst Id: 8497178209169062924
              Index partition Id: 1
              Index replica Id: 1
              Update Status Only: false
      2022-02-16T19:15:01.867-08:00 [Info] LifecycleMgr.DeleteOrPruneIndexInstance() : index defnId 2074639484566203915 instance id 8497178209169062924 real instance id 0 partitions [1]
      2022-02-16T19:15:01.867-08:00 [Info] LifecycleMgr.PruneIndexPartition() : index defnId 2074639484566203915 instance 8497178209169062924 partitions [1]
      2022-02-16T19:15:01.867-08:00 [Info] LifecycleMgr.DeleteIndexInstance() : index defnId 2074639484566203915 instance id 8497178209169062924
      2022-02-16T19:15:01.872-08:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_CLEANUP_PARTITION elapsed 5.794933ms len(expediates) 0 len(incomings) 0 len(outgoings) 0 len(parallels) 0 error <nil>
      2022-02-16T19:15:01.873-08:00 [Info] Indexer::initFromPersistedState Done cleanup for PartitionId: 1 Endpoints: [:9105]
      2022-02-16T19:15:01.877-08:00 [Info] Indexer::initFromPersistedState Skipping index instance 8497178209169062924

      This doesnt result into a crash. This is the Magma longevity test. Note, we are also seeing multiple occurrences of MB-50006 on this cluster.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              amit.kulkarni Amit Kulkarni
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty