Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62141

ActiveStreamCheckpointProcessorTask long runtimes possible when using stream-id

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.2.6, 7.6.4
    • 7.6.0, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.0.6, 7.1.7, 7.2.2, 7.1.6, 7.2.3, 7.2.5, 7.6.2, 7.2.6, 7.6.1, 7.6.4
    • couchbase-bucket
    • Untriaged
    • 0
    • Unknown
    • March-June 24

    Description

      Within the run function is a loop which looks approximately like:

          size_t iterations = 0;
          do {
              auto streams = queuePop();
              if (streams.empty()) {
                  break;
              }
       
              // Now process each ActiveStream
              for (const auto& stream : streams) {
                  stream->nextCheckpointItemTask();
              }
              iterations++;
          } while (!queueEmpty() && iterations < iterationsBeforeYield);
      

      The function nextCheckpointItemTask has a cost of O(n) where n is the checkpoint backlog (this cost varies depending on release as n was capped to prevent readyQ memory spiking). Observed in a large customer environment (lots of memory and thus the ability to queue lots of items) were some very slow runtimes for this task. The environment had CBAS configured which was using DCP stream-ID, i.e. the loop around nextCheckpointItemTask will step many times per vbucket - e.g. our O(n) cost is now O(m*n) - which in the linked case m was 116.

      Thus even with the task trying to yield after some amount of work, the amount of work the task does can be huge, pinning an NONIO task for some time.

      Note the slow runtime is typically seen as new DCP connections come online, and the current in-memory backlog is copied into each stream - once a stream is caught up the n cost is reduced.

      A solution is need to yield the task earlier, but it must also ensure that it does visit each stream, i.e. we need some time based yield + resumable iteration

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              raghav.sk Raghav S K
              jwalker Jim Walker
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty