Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.1.2, 5.5.2, 6.0.0, 6.5.0
-
None
-
Untriaged
-
Unknown
Description
Reported by a user in forums:
https://forums.couchbase.com/t/cannot-communicate-with-indexer-process/18601/20
There is a possible race condition when drop index and index recovery are happening in parallel if the recovery leads to a rollback response from DCP.
Though this code has been in place for so many years, nobody seems to have hit it. Probably as rollbacks are not that common.
1. Indexer starts index recovery and makes request to projector
2018-10-09T15:43:25.098+00:00 [Info] Indexer::startBucketStream
|
2. Drop Index comes in
2018-10-09T15:43:25.525+00:00 [Info] Indexer::handleDropIndex - IndexInstId 17856673439824853533
|
2018-10-09T15:43:25.525+00:00 [Info] MutationMgr::handleUpdateIndexInstMap
|
Message: MsgUpdateInstMap
|
InstanceId: 17856673439824853533 Name: index_tx_contract Bucket: tx-history State: INDEX_STATE_DELETED Stream: MAINT_STREAM RState: RebalActive Version: 0 ReplicaId: 0
|
3. Index gets cleaned up
2018-10-09T15:43:25.546+00:00 [Info] Timekeeper::handleRemoveBucketFromStream
|
2018-10-09T15:43:25.558+00:00 [Info] Indexer::cleanupIndexData 17856673439824853533 Close Done
|
2018-10-09T15:43:27.712+00:00 [Info] Indexer::cleanupIndexData 17856673439824853533 Destroy Done
|
4. DCP sends rollback for the projector request
2018-10-09T15:43:28.494+00:00 [Info] Indexer::startBucketStream Rollback from Projector For Stream MAINT_STREAM Bucket tx-history
|
2018-10-09T15:43:28.494+00:00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket tx-history STREAM_PREPARE_RECOVERY
|
2018-10-09T15:43:28.494+00:00 [Info] Timekeeper::handleInitPrepRecovery MAINT_STREAM tx-history
|
2018-10-09T15:43:28.495+00:00 [Info] Timekeeper::prepareRecovery StreamId MAINT_STREAM Bucket tx-history
|
2018-10-09T15:43:28.495+00:00 [Error] Timekeeper::prepareRecovery Invalid Prepare Recovery Request
|
The problem is that the stream state doesn't get reset after this point. This leads to indexer being in a state where it cannot process further create/drop index requests.
There is special code to handle this on recovery done. Similar logic needs to be put in place for rollback.
|
//during recovery, if all indexes of a bucket gets dropped,
|
//the stream needs to be stopped for that bucket.
|
if !idx.checkBucketExistsInStream(bucket, streamId, true) {
|
if idx.getStreamBucketState(streamId, bucket) != STREAM_INACTIVE {
|
logging.Infof("Indexer::handleRecoveryDone StreamId %v Bucket %v State %v. No Index Found."+
|
"Cleaning up.", streamId, bucket, idx.getStreamBucketState(streamId, bucket))
|
idx.stopBucketStream(streamId, bucket)
|
|
idx.setStreamBucketState(streamId, bucket, STREAM_INACTIVE)
|
}
|
} else {
|
//change status to Active
|
idx.setStreamBucketState(streamId, bucket, STREAM_ACTIVE)
|
}
|
|