Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41283

Crash in checkpoint code due to keyIndexes pointing to freed queued_items

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Since commit 2bd86cd710b5f8f07feb908b29b3f4a48b383b8c for MB-35889, we no longer add key for disk checkpoints into the keyIndexes (committedKeyIndex and metaKeyIndex). This is to decrease memory usage when receiving disk snapshots as keys within the disk snapshot should have been deduped already and assumes that the disk checkpoint will always set the MARKER_FLAG_CHK flat in the DCP snapshot marks flags. However, this was only introduced in 6.5.0 which means when a 6.6.0 node has a DCP Consumer we might receive a snapshot marker with MARKER_FLAG_DISK set but not MARKER_FLAG_CHK. This then will allow the consumer to update an open memory checkpoint to a disk checkpoint if the snapshot marker's start_seqno > 0. Effectively merging the disk and memory checkpoint ranges (as can be seen below).

      void PassiveStream::processMarker(SnapshotMarker* marker) {
      ...
              // We could be connected to a non sync-repl, so if the max-visible is
              // not transmitted (optional is false), set visible to snap-end
              auto visibleSeq =
                      marker->getMaxVisibleSeqno().value_or(marker->getEndSeqno());
              if (marker->getFlags() & MARKER_FLAG_DISK && vb->getHighSeqno() == 0) {
                  vb->setReceivingInitialDiskSnapshot(true);
                  ckptMgr.createSnapshot(cur_snapshot_start.load(),
                                         cur_snapshot_end.load(),
                                         hcs,
                                         checkpointType,
                                         visibleSeq);
              } else {
                  if (marker->getFlags() & MARKER_FLAG_CHK ||
                      vb->checkpointManager->getOpenCheckpointId() == 0) {
                      ckptMgr.createSnapshot(cur_snapshot_start.load(),
                                             cur_snapshot_end.load(),
                                             hcs,
                                             checkpointType,
                                             visibleSeq);
                  } else {
                      // If we are reconnecting then we need to update the snap end
                      // and potentially the checkpoint type as We do not send the
                      // CHK snapshot marker flag for disk snapshots.
                      ckptMgr.updateCurrentSnapshot(
                              cur_snapshot_end.load(), visibleSeq, checkpointType);
                  }
              }
      ....
      

      Then when processing the queued_items this disk checkpoint we call Checkpoint::queueDirty(), we might find the key in the committedKeyIndex:

      QueueDirtyStatus Checkpoint::queueDirty(const queued_item& qi,
                                              CheckpointManager* checkpointManager) {
      ..
              // Check in the appropriate key index if an item already exists.
              auto& keyIndex =
                      qi->isCommitted() ? committedKeyIndex : preparedKeyIndex;
              auto it = keyIndex.find(makeIndexKey(qi));
       
              // Before de-duplication could discard a delete, store the largest
              // "rev-seqno" encountered
              if (qi->isDeleted() &&
                  qi->getRevSeqno() > maxDeletedRevSeqno.value_or(0)) {
                  maxDeletedRevSeqno = qi->getRevSeqno();
              }
       
              // Check if this checkpoint already has an item for the same key
              // and the item has not been expelled.
              if (it != keyIndex.end()) {
                  if (it->second.mutation_id > highestExpelledSeqno) {
                      // Normal path - we haven't expelled the item. We have a valid
                      // cursor position to read the item and make our de-dupe checks.
                      const auto currPos = it->second.position;
                      if (!(canDedup(*currPos, qi))) {
                          return QueueDirtyStatus::FailureDuplicateItem;
                      }
      ....
                      addItemToCheckpoint(qi);
       
                      // Reduce the size of the checkpoint by the size of the
                      // item being removed.
                      queuedItemsMemUsage -= ((*currPos)->size());
                      // Remove the existing item for the same key from the list.
                      toWrite.erase(currPos);
                  } else {
      

      Then the key we find in the commitedKeyIndex (currPos, is erased from the toWrite CheckpointQueue. However, this still leaves a value in the commitKeyIndex. As we don't update the committedKeyIndex 's entry to point to the new queued_item and leave it pointing to the freed queued_item that was in the toWrite queue. Due to the fact that isDiskCheckpoint() will return true.

          if (qi->getKey().size() > 0 && !isDiskCheckpoint()) {
              ChkptQueueIterator last = end();
              // --last is okay as the list is not empty now.
              index_entry entry = {--last, qi->getBySeqno()};
              // Set the index of the key to the new item that is pushed back into
              // the list.
              if (qi->isCheckPointMetaItem()) {
                  // Insert the new entry into the metaKeyIndex
                  auto result = metaKeyIndex.emplace(makeIndexKey(qi), entry);
                  if (!result.second) {
                      // Did not manage to insert - so update the value directly
                      result.first->second = entry;
                  }
              } else {
                  // Insert the new entry into the keyIndex
                  auto& keyIndex =
                          qi->isCommitted() ? committedKeyIndex : preparedKeyIndex;
                  auto result = keyIndex.emplace(makeIndexKey(qi), entry);
                  if (!result.second) {
                      // Did not manage to insert - so update the value directly
                      result.first->second = entry;
                  }
              }
      

      Fix:
      To fix this we need to ensure that we continue adding values to the keyIndexes if the checkpoint was a CheckpointType::Memory and then is updated to CheckpointType::Disk by CheckpointManager::updateCurrentSnapshot().

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              richard.demellow Richard deMellow
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty