Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53964

Backport for MB-53662 / Process crashes during delta recovery

    XMLWordPrintable

Details

    • 1
    • Yes

    Description

      We implemented a task cancellation mechanism for improving the shutdown times for magma db in 7.1. A taskgroupID was introduced to cancel a group of tasks. The taskGroupID is designed to be unique for each database. But, unfortunately, there was a bug in the initialization of the taskGroupID. It resulted in taskGroupID being assigned to each database to be random. Most of the times it works fine and most databases have unique ID. But, whenever it conflicts with multiple databases, we run into problems.

      This problem occurs when we have multiple buckets and one bucket is warming up while another bucket is shutting down. If two databases between the buckets have the same taskGroupID, the task cancellation request (from the shutting down bucket) can cancel the tasks from the bucket that is warming up. It resulted in initializing some of the vbuckets without actually opening SSTables (SSTable Open was executed as a task and the task got canceled). As a result, when we try to read from the SSTable, it crashes.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            sarath Sarath Lakshman
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty