Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62608

[CBM] [Stats] Error when closing internal stat collectors

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • No

    Description

      What's the issue?

      cannot close stat file of ongoing collection
      

      There appears to be an issue in graceful teardown of the 'cbbackupmgr' internal stat collection due to our mixed use of 'context.Context' and 'atomic.LoadUint64'.

      1. Stat collection stopped
      2. Cancel context
      3. Close state gatherers
      4. Handle context cancellation and atomically set `done` variable.
      5. Check value of `done`

      We've only seen this issue occur twice in our Capella environments to date (based on our historical logging) so it seems to be fairly rare.

      I suspect this is probably unlikely to happen in the common case - as we've observed - but when there's actually a stat collection running (occurs periodically, >30s) that will block the context cancellation from propagating, as the thread is busy collecting stats.

      Steps to Reproduce (Hypothesis)

      1. Start backup that is sufficiently large to have stat collection trigger
      2. Backup completes while a stat collection is in-progress

      What's the fix?
      We shouldn't close the workers until the context cancellation has propagated.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-62608
          # Subject Branch Project Status CR V

          Activity

            People

              james.lee James Lee
              james.lee James Lee
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty