Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49685

Seqno-ordered collection filtered backfills monopolise task/thread

    XMLWordPrintable

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.0.0, 7.1.0, 7.0.2, 7.0.1
    • Morpheus
    • couchbase-bucket
    • None
    • Untriaged
    • 1
    • No

    Description

      The amount of time a backfill may run for before yielding (allowing other backfills to run) is indirectly limited by:

      • a per-scan byte limit
      • outstanding bytes limit (bytes read by backfill but not yet sent by the stream)

      However, for a collection-filtered stream, these limits are only applied to items which actually match the filter. A single collection may only be a small fraction of the data on disk - a seqno-ordered backfill may read a significant amount of data (taking a significant amount of time) before enough items for the desired collection have been read.

      bool ActiveStream::backfillReceived(std::unique_ptr<Item> itm,
                                          backfill_source_t backfill_source) {
      ...
          // Is the item accepted by the stream filter (e.g matching collection?)
          if (!filter.checkAndUpdate(*itm)) {
              // Skip this item, but continue backfill at next item.
              return true;
          }
      ...
          if (!producer->recordBackfillManagerBytesRead(
                      resp->getApproximateSize())) {
              return false;
          }
      

      This may adversely affect other backfills for the same backfill manager, and other task running on AuxIO threads.

      investigate altering the scan limit to account for bytes read even if they do not match the filter, or an additional limit solely tracking bytes or items read.

      Uncovered during investigation for MB-48569

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            Linking MB-49702 - an issue where backfills are slow and ultimately result in failed rebalance. This issue is contributing to the slowness seen there.

            drigby Dave Rigby added a comment - Linking MB-49702 - an issue where backfills are slow and ultimately result in failed rebalance. This issue is contributing to the slowness seen there.

            Filtering to the correct collection actually occurs earlier, before the value is read from disk: http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/dcp/backfill_disk.cc#65-71

            As such, this is likely more efficient that initially suspected (also suggesting other issues may be present in the originating MB-48569).

            james.harrison James Harrison added a comment - Filtering to the correct collection actually occurs earlier, before the value is read from disk: http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/dcp/backfill_disk.cc#65-71 As such, this is likely more efficient that initially suspected (also suggesting other issues may be present in the originating MB-48569 ).
            drigby Dave Rigby added a comment -

            As such, this is likely more efficient that initially suspected (also suggesting other issues may be present in the originating MB-48569).

            I think it might be a bit more complicated than that While your observation is correct, there's still cost in reading the B-Tree (Couchstore) or SST files (Magma) for documents which are not ultimately used.

            • For couchstore, when traversing the seqno tree we reading (and decompressing) seqno B-Tree nodes (maps seqno to key) for every seqno in the range, then only performing a second looking on the document_id (inside the seqno tree "value") for documents in the collection. Essentially any B-Tree nodes read which do not have any documents for the given collection are "wasted" overhead.
            • Hand-waving somewhat, but for example with Magma when traversing the seqno tree, we are still reading from disk the key and value of every document in the range (document meta & values are stored inline with the key in the seqnoTree if they are >32 Bytes).

            As such, I think there's still value in adjusting the scan limits to account for actual bytes read from disk during a scan, not just those bytes which contribute to the backfilled documents.

            drigby Dave Rigby added a comment - As such, this is likely more efficient that initially suspected (also suggesting other issues may be present in the originating MB-48569 ). I think it might be a bit more complicated than that While your observation is correct, there's still cost in reading the B-Tree (Couchstore) or SST files (Magma) for documents which are not ultimately used. For couchstore, when traversing the seqno tree we reading (and decompressing) seqno B-Tree nodes (maps seqno to key) for every seqno in the range, then only performing a second looking on the document_id (inside the seqno tree "value") for documents in the collection. Essentially any B-Tree nodes read which do not have any documents for the given collection are "wasted" overhead. Hand-waving somewhat, but for example with Magma when traversing the seqno tree, we are still reading from disk the key and value of every document in the range (document meta & values are stored inline with the key in the seqnoTree if they are >32 Bytes). As such, I think there's still value in adjusting the scan limits to account for actual bytes read from disk during a scan, not just those bytes which contribute to the backfilled documents.

            People

              owend Daniel Owen
              james.harrison James Harrison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty