Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.0.0
-
Security Level: Public
-
Untriaged
-
Yes
Description
After the recent change to start repairStream after MTR is done, it is now possible for Indexer to miss bucket flush during initial MutationTopicRequest.
Found while running below testrunner test:
./testrunner -i b/resources/dev-6-nodes-xdcr_n1ql_2i.ini -t 2i.recovery_2i.SecondaryIndexingRecoveryTests.test_couchbase_bucket_flush,nodes_init=5,nodes_in=1,initial=,before=create_index,after=drop_index,groups=simple,dataset=default,doc-per-day=10,services_init=n1ql:kv-kv-kv-index-index,GROUP=BUCKET-FLUSH,quota_percent=50
Index Build Received:
2015-08-02T01:07:40.54Z+05:30 [Info] Indexer::handleBuildIndex [10444514577939172816]
MTR Starts:
2015-08-02T01:07:40.616Z+05:30 [Info] KVSender::sendMutationTopicRequest Projector 192.168.1.2:10000 Topic INIT_STREAM_TOPIC_60:d7:ae:80:c9:bf:71:8b default
StreamBegins come:
2015-08-02T01:07:40.669Z+05:30 [Debug] TK StreamBegin INIT_STREAM default 1 86095240712019 0
Indexer Flush Is Triggered:
2015-08-02T01:07:40.951Z+05:30 [Debug] Flusher::PersistUptoTS INIT_STREAM default
Bucket gets flushed. StreamEnd is received. But repair is not triggered as MTR is in progress:
2015-08-02T01:07:41.232Z+05:30 [Debug] TK StreamEnd INIT_STREAM default 36 195497018796998 0
MTR keep retrying:
2015-08-02T01:07:44.436Z+05:30 [Error] KVSender::openMutationStream INIT_STREAM default Error Received feed.feeder
2015-08-02T01:07:44.436Z+05:30 [Error] Indexer::sendStreamUpdateForBuildIndex Stream INIT_STREAM Bucket default.Error from Projector feed.feeder. Retrying.
MTR is successful. Due to MTR retry, all vbs have got StreamBegins. So repairStream doesn't happen in Timekeeper.
2015-08-02T01:07:49.59Z+05:30 [Debug] Indexer::sendStreamUpdateForBuildIndex Stream Request Success For Stream INIT_STREAM Bucket default.
2015-08-02T01:07:49.59Z+05:30 [Debug] Indexer::handleStreamRequestDone StreamId INIT_STREAM Bucket default
2015-08-02T01:07:49.59Z+05:30 [Debug] Timekeeper::handleStreamRequestDone StreamId INIT_STREAM Bucket default
2015-08-02T01:07:49.59Z+05:30 [Debug] Timekeeper::checkInitialBuildDone
But now, INIT_STREAM doesn't get to see the rollback msg from DCP as indexer didn't try to repairStream. The bucket has 0 docs (due to flush) and index has the mutations for which TS has already been processed(which is inconsistent with KV).