Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31592

Fix race condition between drop index and rollback response for index recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.5.0
    • 5.1.2, 5.5.2, 6.0.0, 6.5.0
    • secondary-index
    • None
    • Untriaged
    • Unknown

    Description

      Reported by a user in forums:
      https://forums.couchbase.com/t/cannot-communicate-with-indexer-process/18601/20

      There is a possible race condition when drop index and index recovery are happening in parallel if the recovery leads to a rollback response from DCP.
      Though this code has been in place for so many years, nobody seems to have hit it. Probably as rollbacks are not that common.

      1. Indexer starts index recovery and makes request to projector

      2018-10-09T15:43:25.098+00:00 [Info] Indexer::startBucketStream 
      

      2. Drop Index comes in

      2018-10-09T15:43:25.525+00:00 [Info] Indexer::handleDropIndex - IndexInstId 17856673439824853533
      2018-10-09T15:43:25.525+00:00 [Info] MutationMgr::handleUpdateIndexInstMap 
              Message: MsgUpdateInstMap
              InstanceId: 17856673439824853533 Name: index_tx_contract Bucket: tx-history State: INDEX_STATE_DELETED Stream: MAINT_STREAM RState: RebalActive Version: 0 ReplicaId: 0 
      

      3. Index gets cleaned up

      2018-10-09T15:43:25.546+00:00 [Info] Timekeeper::handleRemoveBucketFromStream 
      2018-10-09T15:43:25.558+00:00 [Info] Indexer::cleanupIndexData 17856673439824853533 Close Done
      2018-10-09T15:43:27.712+00:00 [Info] Indexer::cleanupIndexData 17856673439824853533 Destroy Done
      

      4. DCP sends rollback for the projector request

      2018-10-09T15:43:28.494+00:00 [Info] Indexer::startBucketStream Rollback from Projector For Stream MAINT_STREAM Bucket tx-history
      2018-10-09T15:43:28.494+00:00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket tx-history STREAM_PREPARE_RECOVERY
      2018-10-09T15:43:28.494+00:00 [Info] Timekeeper::handleInitPrepRecovery MAINT_STREAM tx-history
      2018-10-09T15:43:28.495+00:00 [Info] Timekeeper::prepareRecovery StreamId MAINT_STREAM Bucket tx-history
      2018-10-09T15:43:28.495+00:00 [Error] Timekeeper::prepareRecovery Invalid Prepare Recovery Request
      

      The problem is that the stream state doesn't get reset after this point. This leads to indexer being in a state where it cannot process further create/drop index requests.

      There is special code to handle this on recovery done. Similar logic needs to be put in place for rollback.

       
          //during recovery, if all indexes of a bucket gets dropped,                                                                                               
          //the stream needs to be stopped for that bucket.                                                                                                         
          if !idx.checkBucketExistsInStream(bucket, streamId, true) {                                                                                               
              if idx.getStreamBucketState(streamId, bucket) != STREAM_INACTIVE {                                                                                    
                  logging.Infof("Indexer::handleRecoveryDone StreamId %v Bucket %v State %v. No Index Found."+                                                      
                      "Cleaning up.", streamId, bucket, idx.getStreamBucketState(streamId, bucket))                                                                 
                  idx.stopBucketStream(streamId, bucket)                                                                                                            
                                                                                                                                                                    
                  idx.setStreamBucketState(streamId, bucket, STREAM_INACTIVE)                                                                                       
              }                                                                                                                                                     
          } else {                                                                                                                                                  
              //change status to Active                                                                                                                             
              idx.setStreamBucketState(streamId, bucket, STREAM_ACTIVE)                                                                                             
          }  
      
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            girish.benakappa Girish Benakappa
            deepkaran.salooja Deepkaran Salooja
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty