Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Ashwin GovindarajuluAshwin GovindarajuluReporter
James HarrisonJames Harrison(Deactivated)Is this a Regression?
NoTriage
UntriagedDue date
May 22, 2023Story Points
0Priority
MajorInstabug
Open Instabug
Details
Details
Assignee
Ashwin Govindarajulu
Ashwin GovindarajuluReporter
James Harrison
James Harrison(Deactivated)Is this a Regression?
No
Triage
Untriaged
Due date
May 22, 2023
Story Points
0
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created May 17, 2023 at 1:21 PM
Updated September 2, 2024 at 11:13 AM
Resolved May 25, 2023 at 3:31 PM
delWithMeta
(used by xdcr/backup/replication internally) may get stuck in a loop of attempting to bgfetch a value for a non-existent item in order to to preserve xattrs.See the two noted MBs for context on why this is here.
Scenario as follows:
Item with system xattrs is deleted, pruned value (retaining the system attrs) is persisted, value removed from the HashTable
Incoming request queues a meta bgfetch, adds a tempInitial item in the HashTable
meta bgfetch finds the delete, calls
restoreMeta
which will set the datatype and some other meta fields, changing the item from tempInitial->tempDeleted. This is still not considered resident.TIme passes, and the delete is purged from disk
A non-meta (i.e., wants the value too) bgfetch is requested
bgfetch finds nothing on disk, changes the item from tempDeleted->tempNonExistent
Now we have a temp item, with datatype set (could be xattrs), but nothing on disk for that key.
The above snippet would request a bgfetch for that item, but no further progress will be made - there is nothing on disk to fetch, and the item already indicates this as it is tempNonExistent. The fetcher would read from disk, and notify the op it has completed, the frontend would attempt the delWithMeta again, queuing another bgfetch etc.
This most obviously manifests as one (or more) front-end memcached threads spinning at 100% CPU.
Issue
Resolution
XDCR or restore from backup entered an endless loop if attempting to overwrite a document which was deleted or expired some time ago with a deleteWithMeta operation. This was due to a specific unanticipated state in memory which increased CPU usage, and connection became unusable for further operations.
deleteWithMeta is now resilient to temporary non-existent values with xattr datatype.