Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.2.0
-
Untriaged
-
0
-
No
Description
What's the issue?
cannot close stat file of ongoing collection
|
There appears to be an issue in graceful teardown of the 'cbbackupmgr' internal stat collection due to our mixed use of 'context.Context' and 'atomic.LoadUint64'.
- Stat collection stopped
- Cancel context
- Close state gatherers
- Handle context cancellation and atomically set `done` variable.
- Check value of `done`
We've only seen this issue occur twice in our Capella environments to date (based on our historical logging) so it seems to be fairly rare.
I suspect this is probably unlikely to happen in the common case - as we've observed - but when there's actually a stat collection running (occurs periodically, >30s) that will block the context cancellation from propagating, as the thread is busy collecting stats.
Steps to Reproduce (Hypothesis)
- Start backup that is sufficiently large to have stat collection trigger
- Backup completes while a stat collection is in-progress
What's the fix?
We shouldn't close the workers until the context cancellation has propagated.
Attachments
Issue Links
- causes
-
CCBSE-1925 Loading...