Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-63261

Race condition between updateRPSns and gcSn leading to inconsistency between mainstore/backstore and difference in item count

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown
    • Critical

    Description

      For Recovery point snapshot, GSI holds an additional refcount and passes it to plasma. To handle a corner case - do not consider the recovery points in compaction until the recovery point block is persisted we made a change in MB-46756 which updated the rpSns (list of recovery points)  after snapshot is closed. This created a race condition in the order in which gcSn can advance while the recovery point creation is in progress.  Due to this the snIntervals we compute may not transiently include the RP snapshot number as the rpSn list is yet to be updated.  Therefore gcFilter used to compact records may not see the latest recovery point snapshot. Due to this,  pages for mainStore and backStore can get compacted differently and go out of sync for a while.

      For e.g. say for a docId, backstore has compacted all the insert/delete pairs due to not having the latest RP snapshot boundary. Whereas the mainstore still has the insert/delete records across a RP snapshot boundary. On a crash, as we recover from a common recovery point followed by rollback, say mainstore delete record is pruned if it >= rollback start; we will need a delete mutation from DCP for the item count back to be normal. The delete mutation comes to BackStore, but lookup fails as it is compacted and no longer has the item; hence mainstore does not get the required delete mutation. This leads to a duplicate item count.

      The item count issue can be more visible after recovery due to rollback. (Please note even without recovery, mainstore/backstore pages can still remain out of sync due to the issue)

      Please refer comments in the CBSE

      https://issues.couchbase.com/browse/CBSE-17770?focusedId=793704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-793704

      https://issues.couchbase.com/browse/CBSE-17770?focusedId=794452&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-794452

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              hemant.rajput Hemant Rajput
              saptarshi.sen Saptarshi Sen
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty