Details
-
Improvement
-
Resolution: Unresolved
-
Critical
-
7.1.4
-
0
Description
Indexer and projector rely on their local ns_server to fetch the latest active vbucket mappings. During a KV rebalance or failover, active vbucket mappings change and the updated state can take a while to be replicated across all ns_servers. It can lead to a case where the projector node's ns_server receives the update first before an indexer node's ns_server. In general, this is not an issue but a specific chain of events with a race condition can cause the MAINT stream to be stuck in a repair loop, failing any index builds that rely on the affected MAINT stream.
The chain of events is as follows,
- An event triggers a change in the active vbucket ownership (e.g. KV rebalance or a failover)
- Indexer receives a vbucket X MAINT stream end from the old KV node (A)
- Indexer attempts stream repair on the new KV node (B). [Ownership of vbucket X is updated in the Indexer's local ns_server]
- As the rebalance is not complete, the indexer receives feed.invalidBucket from B.
- Indexer initiates stream repair of all vbuckets with MTR to all kv nodes
- Indexer receives a vbucket Y MAINT StreamBegin from B [Due to the above MTR] before the StreamEnd from A (race condition). The StreamBegin was sent as B's ns_server is updated with the latest ownership of vbucket Y.
- Indexer treats the Y's StreamBegin as a duplicate and tries to repair vbucket Y by setting its repair state to RESTART_VB and vbucket state to CONN_ERR
- Indexer finds the projector addresses of vbucket Y using the terse-bucket endpoint (dcp/pool.go/RefreshBucket]). This talks to the local ns_server which returns A's address. [Delay in ownership update. Could be due to disk latency] (second race condition)
- Indexer sends vbucket Y shutdown request to A and receives a stream end from A, changing the vbucket Y status to SHUTDOWN_VB.
- Indexer's local ns_server is now updated with Y's ownership.
- Indexer now tries to start the vbucket Y but as the B's projector already sent the StreamBegin, the request is ignored. Y's SHUTDOWN_VB state prevents it from being picked again for shutdown thus resulting in the indexer trying to start the stream every 1 min and the projector ignoring it (repair loop).
- After 30 min, as there is no streamBegin, the Indexer uses the MTR for stream repair. But Y's state is still >= SHUTDOWN_VB. So a shutdown request is not issued and the projector ignores all further streambegin requests.
Possible solutions:
- Send the vbucket mapping as seen by the indexer node to the projector during the MTR. Projector only sends the stream begins for the intersection of active vbucket mappings from its local ns_server and the mappings sent by the indexer. Missing stream begins should be retried later.
- Indexer to again change the vbucket status to CONN_ERR and repair state to RESTART_VB after a timeout.
- Have chronicle/ns_server support "Read committed" level on the terse bucket endpoint.
Fixing the repair loop manually:
If such an issue occurs, an immediate fix would be to restart the projector to which the indexer is requesting for vbucket restart but the projector doesn't send a streamBegin.
We see the below log continuously every minute as the indexer is on a retry loop due to state corruption.
indexer.log
|
KVSender::sendRestartVbuckets Projector <projector> Topic MAINT_STREAM_TOPIC_<id> <bucket> <bucket>
|
... every 1 min |
projector.log
|
FEED[<=>MAINT_STREAM_TOPIC_<id>(ip)] <> start-timestamp bucket: <bucket>, scope :, collectionIDs: [], vbuckets: 0 - {vbno, vbuuid, manifest, seqno, snapshot-start, snapshot-end} |
... every 1 min |
Restarting the projector should reset the state and allow for the stream to begin from the projector. Until the projector is restarted, the indexer node cannot build an index on the affected bucket.
Related MB's:
- https://issues.couchbase.com/browse/MB-54667 --stream repair stuck due to similar state corruption but with a different chain of events
- https://issues.couchbase.com/browse/MB-51636 --stream repair stuck due to similar state corruption. In this case, the shutdown is sent to the old kv node (same as this MB) but the stale cache was likely the issue which was later disabled as the fix. However, the core issue still remains – Indexer and projector talking to their local ns_servers which are not synced.