Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.2.6, 7.6.4
Affects Version/s: 7.6.0, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.0.6, 7.1.7, 7.2.2, 7.1.6, 7.2.3, 7.2.5, 7.6.2, 7.2.6, 7.6.1, 7.6.4
Component/s: couchbase-bucket
Labels:
- approved-for-7.2.6

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown
Sprint:
March-June 24

Description

Within the run function is a loop which looks approximately like:

    size_t iterations = 0;

    do {

        auto streams = queuePop();

        if (streams.empty()) {

            break;

        // Now process each ActiveStream

        for (const auto& stream : streams) {

            stream->nextCheckpointItemTask();

        iterations++;

    } while (!queueEmpty() && iterations < iterationsBeforeYield);

The function nextCheckpointItemTask has a cost of O(n) where n is the checkpoint backlog (this cost varies depending on release as n was capped to prevent readyQ memory spiking). Observed in a large customer environment (lots of memory and thus the ability to queue lots of items) were some very slow runtimes for this task. The environment had CBAS configured which was using DCP stream-ID, i.e. the loop around nextCheckpointItemTask will step many times per vbucket - e.g. our O(n) cost is now O(m*n) - which in the linked case m was 116.

Thus even with the task trying to yield after some amount of work, the amount of work the task does can be huge, pinning an NONIO task for some time.

Note the slow runtime is typically seen as new DCP connections come online, and the current in-memory backlog is copied into each stream - once a stream is caught up the n cost is reduced.

A solution is need to yield the task earlier, but it must also ensure that it does visit each stream, i.e. we need some time based yield + resumable iteration