Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56970

DeleteWithMeta endlessly attempts to bgfetch tempNonExistent items

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • No

    Description

      delWithMeta (used by xdcr/backup/replication internally) may get stuck in a loop of attempting to bgfetch a value for a non-existent item in order to to preserve xattrs.

       else if (cachedVbState == vbucket_state_active &&
                     mcbp::datatype::is_xattr(v->getDatatype()) && !v->isResident()) {
              // MB-25671: A temp deleted xattr with no value must be fetched before
              // the deleteWithMeta can be applied.
              // MB-36087: Any non-resident value
              delrv = MutationStatus::NeedBgFetch;
              metaBgFetch = false;
          }
      

      See the two noted MBs for context on why this is here.

      Scenario as follows:

      • Item with system xattrs is deleted, pruned value (retaining the system attrs) is persisted, value removed from the HashTable
      • Incoming request queues a meta bgfetch, adds a tempInitial item in the HashTable
      • meta bgfetch finds the delete, calls restoreMeta which will set the datatype and some other meta fields, changing the item from tempInitial->tempDeleted. This is still not considered resident.
      • TIme passes, and the delete is purged from disk
      • A non-meta (i.e., wants the value too) bgfetch is requested
      • bgfetch finds nothing on disk, changes the item from tempDeleted->tempNonExistent

      Now we have a temp item, with datatype set (could be xattrs), but nothing on disk for that key.

      The above snippet would request a bgfetch for that item, but no further progress will be made - there is nothing on disk to fetch, and the item already indicates this as it is tempNonExistent. The fetcher would read from disk, and notify the op it has completed, the frontend would attempt the delWithMeta again, queuing another bgfetch etc.

      This most obviously manifests as one (or more) front-end memcached threads spinning at 100% CPU.

       

      Issue Resolution
      XDCR or restore from backup entered an endless loop if attempting to overwrite a document which was deleted or expired some time ago with a deleteWithMeta operation. This was due to a specific unanticipated state in memory which increased CPU usage, and connection became unusable for further operations. deleteWithMeta is now resilient to temporary non-existent values with xattr datatype.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              james.harrison James Harrison (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty