Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50589

Warmup scan of large range of deleted items can hang warmup indefinitely

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Yes
    • KV 2022-Jan

    Description

      Believe that this was introduced with MB-47267.

      Warmup skips deleted items when scanning disk - https://github.com/couchbase/kv_engine/blob/a6acea19e938412df114fe77dfa6a408c2d92424/engines/ep/src/warmup.cc#L517-L524.

      The crux of this comes down to not moving ScanContext::lastReadSeqno when we see deleted items in this case. CouchKVStore passes this filter down to couchstore so we won't invoke the LoadStorageKVPairCallback until we find a non-deleted item. MagmaKVStore filters the deletes and moves on to the next item. For both KVStores when we resume a scan we start from lastReadSeqno + 1 if lastReadSeqno != 0. During warmup we decide to pause a scan if more than some fixed amount of time. That time for Backfill tasks if set to 10 milliseconds.
      https://github.com/couchbase/kv_engine/blob/a6acea19e938412df114fe77dfa6a408c2d92424/engines/ep/src/warmup.cc#L969-L974

      If we have an on disk structure as follows:

      [1:alive, 2:deleted, 3:deleted, ..., n:deleted, n+1:alive]

      Then we can end up in a scenario where lastReadSeqno gets set to 1 for the first item read, and that item is warmed up. If the scan of 2-n takes more than 10 milliseconds then when we reach the item at n+1 Warmup decides to pause the scan. During the scan from 2-n we don't update lastReadSeqno meaning that the scan gets restarted from 2 rather than n+1 which if disk is consistently slow could result in warmup indefinitely hanging as scans repeat over the same range of deleted items.

      Attachments

        For Gerrit Dashboard: MB-50589
        # Subject Branch Project Status CR V

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ben.huddleston Ben Huddleston
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty