Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.6.0, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.0.6, 7.1.7, 7.2.2, 7.1.6, 7.2.3, 7.2.5, 7.6.2, 7.2.6, 7.6.1, 7.6.4
-
Untriaged
-
0
-
Unknown
-
March-June 24
Description
Within the run function is a loop which looks approximately like:
size_t iterations = 0;
|
do {
|
auto streams = queuePop();
|
if (streams.empty()) {
|
break;
|
}
|
|
// Now process each ActiveStream
|
for (const auto& stream : streams) {
|
stream->nextCheckpointItemTask();
|
}
|
iterations++;
|
} while (!queueEmpty() && iterations < iterationsBeforeYield);
|
The function nextCheckpointItemTask has a cost of O(n) where n is the checkpoint backlog (this cost varies depending on release as n was capped to prevent readyQ memory spiking). Observed in a large customer environment (lots of memory and thus the ability to queue lots of items) were some very slow runtimes for this task. The environment had CBAS configured which was using DCP stream-ID, i.e. the loop around nextCheckpointItemTask will step many times per vbucket - e.g. our O(n) cost is now O(m*n) - which in the linked case m was 116.
Thus even with the task trying to yield after some amount of work, the amount of work the task does can be huge, pinning an NONIO task for some time.
Note the slow runtime is typically seen as new DCP connections come online, and the current in-memory backlog is copied into each stream - once a stream is caught up the n cost is reduced.
A solution is need to yield the task earlier, but it must also ensure that it does visit each stream, i.e. we need some time based yield + resumable iteration
Attachments
Issue Links
- relates to
-
MB-62120 ActiveStreamCheckpointProcessorTask::run keeps readLock on StreamContainer
- Closed