Cause of the problem identified. Fix is a one-line change to the Flusher (revert a line incorrectly changed by http://review.couchbase.org/#/c/117261/ - a performance/efficiency improvement for SyncWrites. Read on for details...
—
Summary
Commit 1f64b646719dacba8aa78b1101647a56ae94bbb8 modified the Flusher to use VBReadyQueue to manage the low-priority VBuckets waiting to be
flushed. However, this change introduced a starvation issue for low-priority vBuckets if there are outstanding high-priority vBuckets which are still awaiting seqno_persistence.
Details
Consider the following scenario:
- At least one SEQNO_PERSISTENCE request is outstanding - VBucket::hpVBReqs is non-empty for at least one vBucket.
- This vbucket does not yet have the seqno requested - for example it's a replica vBucket and memory usage is high and replication to it has been throttled.
- At least one other (low priority) vBucket is awaiting flushing.
The consequence is that the low-priority vBucket will never get flushed (not until the high priority vBucket completes its seqno_persistence). This can lead to livelock - if we actually could flush the low-priority vBucket(s) then that would allow CheckpointMemory to be freed (expelling / removing closed, unreferenced checkpoints).
The actual problem is the logic in Flusher::flushVB(). This is a (needlessly?) complex function, but the high level logic involves switching between two modes (Flusher::doHighPriority):
A) While there are no outstanding SEQNO_PERSISTENCE vBuckets for this shard, flush the next vBucket in lpVbs (low priority). Reschedule
(return true) if there's any more low-priority VBs to flush.
B) If there are outstanding SEQNO_PERSISTENCE vBuckets for this shard:
- If hpVBs (high priority vbs) is empty, populate the hpVBs queue with all vBuckets with outstanding SEQNO_PERSISTENCE requests.
- Flush the next vb from hpVBs, Reschdule (return true) if there are any more items in hpVBs.
- Once all outstanding SEQNO_PERSISTENCE vBuckets have been flushed, allow an equal number of low-priority vBuckets to be flushed before retrying the outstanding SEQNO_PERSISTENCE vBuckets.
Step B.3 is crucial to avoid starvation of the low-priority queue - without this step, then a single slow VBucket with an outstanding SEQNO_PERSISTENCE request can prevent all other vBuckets in the shard from being flushed.
However, when the aforementioned path to use VBReadyQueue was introduced, it inadvertently prevented step B.3 from actually occurring.
This is because when re-entering Flusher::flushVB after flushing the last high priority VB from hpVBs (i.e. after step B.2), we incorrectly only check if hpVbs.empty() is true, and if so set doHighPriority to false - i.e. going back to mode A.
The fix is restore the logic from before the patch - only switch back to mode A (doHighPriority=false) when both low and high priority VB queues are empty.
Looking at the memcached.log for the node being graceful failed over (.125), I see the following repeated log messages for vb 60 and 61:
2019-12-17T13:27:13.081769+00:00 INFO 57: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.28.128.125->ns_1@172.28.128.126:default - (vb:61) DcpProducer::addTakeoverStats empty streams list found
2019-12-17T13:27:13.081908+00:00 WARNING 55: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.28.128.125->ns_1@172.28.128.128:default - (vb:61) ActiveStream::addTakeoverStats: Stream has status StreamDead
2019-12-17T13:27:13.082702+00:00 INFO 57: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.28.128.125->ns_1@172.28.128.126:default - (vb:60) DcpProducer::addTakeoverStats empty streams list found
2019-12-17T13:27:13.082812+00:00 WARNING 55: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.28.128.125->ns_1@172.28.128.128:default - (vb:60) ActiveStream::addTakeoverStats: Stream has status StreamDead
These messages repeat every 5s.
Note that those warnings are printed when requesting stats for an Active (producer) stream which is dead; and returns the following status to ns_server:
logPrefix);
}
It's not clear to me why ns_server is repeatedly requesting these stats.
ns_server - could you please take a look and see why these vBuckets are getting repeatedly polled?