Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46492

Unable to find recovery point after plasma recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • storage-engine
    • Untriaged
    • 1
    • Unknown

    Description

      Saw these errors while investigating 46318 (logs - s3://cb-customers-secure/mb-46190/2021-05-21/collectinfo-2021-05-21t045513-ns_1@172.23.97.77.zip)

      1. On node 172.23.97.77, Index gets built:

      2021-05-20T14:07:23.072-07:00 [Info] Indexer::handleBuildIndex Added Index: [12917581944872651272] to Stream: INIT_STREAM State: INDEX_STATE_INITIAL
      2021-05-20T14:07:23.074-07:00 [Info] MutationMgr::handleUpdateIndexInstMap, updated instances:
              InstanceId: 12917581944872651272 Name: idx1_KDve3 Keyspace: bucket2:scope_2:coll_13 State: INDEX_STATE_INITIAL Stream: INIT_STREAM RState: RebalActive Version: 0 ReplicaId: 3
      

      2. Disk snapshot gets created for both partition 2 and partition 4 as part of build done.

      2021-05-20T14:11:25.363-07:00 [Info] plasmaSlice::OpenSnapshot SliceId 0 IndexInstId 12917581944872651272 PartitionId 4 Creating New Snapshot SnapshotInfo: count:236481 committed:true
      2021-05-20T14:11:25.363-07:00 [Info] StorageMgr::handleCreateSnapshot Added New Snapshot Index: 12917581944872651272 PartitionId: 4 SliceId: 0 Crc64: 9528950848570813173 (Sn
      apshotInfo: count:236481 committed:true) SnapType DISK_SNAP SnapAligned true SnapCreateDur 144.248231ms SnapOpenDur 87.709µs
      2021-05-20T14:11:25.363-07:00 [Info] PlasmaSlice Slice Id 0, IndexInstId 12917581944872651272, PartitionId 4 Creating recovery point ...
      2021-05-20T14:11:25.416-07:00 [Info] plasmaSlice::OpenSnapshot SliceId 0 IndexInstId 12917581944872651272 PartitionId 2 Creating New Snapshot SnapshotInfo: count:236459 committed:true
      2021-05-20T14:11:25.416-07:00 [Info] StorageMgr::handleCreateSnapshot Added New Snapshot Index: 12917581944872651272 PartitionId: 2 SliceId: 0 Crc64: 9528950848570813173 (Sn
      apshotInfo: count:236459 committed:true) SnapType DISK_SNAP SnapAligned true SnapCreateDur 53.119628ms SnapOpenDur 82.812µs
      2021-05-20T14:11:25.416-07:00 [Info] PlasmaSlice Slice Id 0, IndexInstId 12917581944872651272, PartitionId 2 Creating recovery point ..
      

      3. Merge happens and index moves to ACTIVE state

      2021-05-20T14:11:25.601-07:00 [Info] Timekeeper::checkInitStreamReadyToMerge Index Ready To Merge. Index: 12917581944872651272 Stream: INIT_STREAM KeyspaceId: bucket2:scope_
      2:coll_13 SessionId: 1 LastFlushTS: bucket: bucket2, scopeId: , collectionId: , vbuckets: 1024 Crc64: 9528950848570813173 snapType DISK_SNAP -
       
      2021-05-20T14:11:25.606-07:00 [Info] MutationMgr::handleUpdateIndexInstMap, updated instances:
              InstanceId: 12917581944872651272 Name: idx1_KDve3 Keyspace: bucket2:scope_2:coll_13 State: INDEX_STATE_ACTIVE Stream: MAINT_STREAM RState: RebalActive Version: 0 ReplicaId: 3
      

      4. Test kills indexer for recovery after 25mins.

      2021-05-20T14:34:41.639-07:00 [Info] Indexer started with command line:
      

      5. Plasma is seen recovering both the partitions:

      2021-05-20T14:34:48.584-07:00 [Info] bucket2/idx1_72qm/Backstore#11860434538765390097:3 Plasma: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_4.index/docIndex: Repair 0 added 0 deleted
      2021-05-20T14:34:49.161-07:00 [Info] bucket2/idx1_72qm/Backstore#11860434538765390097:3 Plasma: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_2.index/docIndex: Repair 0 added 0 deleted
       
      2021-05-20T14:34:57.035-07:00 [Info] bucket2/idx1_72qm/Mainstore#11860434538765390097:3 Plasma: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_4.index/mainIndex: Repair 20 added 0 deleted
      2021-05-20T14:34:57.110-07:00 [Info] bucket2/idx1_72qm/Mainstore#11860434538765390097:3 Plasma: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_2.index/mainIndex: Repair 27 added 0 deleted
      

      6. Then "Unable to find recovery point" message appears for both the partitions and disk snapshot is not available for loading.

      2021-05-20T14:34:57.211-07:00 [Info] plasmaSlice::doRecovery SliceId 0 IndexInstId 12917581944872651272 PartitionId 4 Recovering from recovery point ..
      2021-05-20T14:34:57.211-07:00 [Info] plasmaSlice::doRecovery SliceId 0 IndexInstId 12917581944872651272 PartitionId 4 Unable to find recovery point. Resetting store ..
       
      2021-05-20T14:36:37.598-07:00 [Info] plasmaSlice::doRecovery SliceId 0 IndexInstId 12917581944872651272 PartitionId 2 Recovering from recovery point ..
      2021-05-20T14:36:37.598-07:00 [Info] plasmaSlice::doRecovery SliceId 0 IndexInstId 12917581944872651272 PartitionId 2 Unable to find recovery point. Resetting store ..
       
      2021-05-20T14:36:37.596-07:00 [Info] plasmaSlice:NewplasmaSlice Created New Slice Id 0 IndexInstId 12917581944872651272 partitionId 4 WriterThreads 2
      2021-05-20T14:36:37.596-07:00 [Info] Indexer::initPartnInstance Initialized Slice:
               Index: 12917581944872651272 Slice: SliceId: 0 File: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_4.index Index: 12917581944872651272 Partition: 4
      2021-05-20T14:36:37.596-07:00 [Info] Indexer::initPartnInstance Initialized Partition:
               Index: 12917581944872651272 Partition: PartitionId: 2 Endpoints: [:9105]
               
      2021-05-20T14:36:38.996-07:00 [Info] Indexer::initPartnInstance Initialized Slice:
               Index: 12917581944872651272 Slice: SliceId: 0 File: /data/@2i/bucket2_idx1_KDve3_12917581944872651272_2.index Index: 12917581944872651272 Partition: 2
      2021-05-20T14:36:38.996-07:00 [Info] StorageMgr::updateIndexSnapMapForIndex IndexInst 12917581944872651272 Partitions [4 2]
      2021-05-20T14:36:38.996-07:00 [Info] StorageMgr::openSnapshot IndexInst:12917581944872651272 Partition:4 No Snapshot Found.
      2021-05-20T14:36:38.996-07:00 [Info] StorageMgr::updateIndexSnapMapForIndex IndexInst 12917581944872651272 Adding Nil Snapshot.
      

      7. There are multiple such messages. I have investigated only the last one.

      cbcollect_info_ns_1@172.23.97.77_20210521-045516 > grep "Unable to find recovery point. Resetting store" ns_server.indexer.log | wc -l
            49
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jliang John Liang
            deepkaran.salooja Deepkaran Salooja
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty