delWithMeta (used by xdcr/backup/replication internally) may get stuck in a loop of attempting to bgfetch a value for a non-existent item in order to to preserve xattrs.
See the two noted MBs for context on why this is here.
Scenario as follows:
- Item with system xattrs is deleted, pruned value (retaining the system attrs) is persisted, value removed from the HashTable
- Incoming request queues a meta bgfetch, adds a tempInitial item in the HashTable
- meta bgfetch finds the delete, calls restoreMeta which will set the datatype and some other meta fields, changing the item from tempInitial->tempDeleted. This is still not considered resident.
- TIme passes, and the delete is purged from disk
- A non-meta (i.e., wants the value too) bgfetch is requested
- bgfetch finds nothing on disk, changes the item from tempDeleted->tempNonExistent
Now we have a temp item, with datatype set (could be xattrs), but nothing on disk for that key.
The above snippet would request a bgfetch for that item, but no further progress will be made - there is nothing on disk to fetch, and the item already indicates this as it is tempNonExistent. The fetcher would read from disk, and notify the op it has completed, the frontend would attempt the delWithMeta again, queuing another bgfetch etc.
This most obviously manifests as one (or more) front-end memcached threads spinning at 100% CPU.
|XDCR or restore from backup entered an endless loop if attempting to overwrite a document which was deleted or expired some time ago with a deleteWithMeta operation. This was due to a specific unanticipated state in memory which increased CPU usage, and connection became unusable for further operations.
|deleteWithMeta is now resilient to temporary non-existent values with xattr datatype.