Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
5.5.0
-
Triaged
-
No
Description
During investigation of MB-28431, it was found that CheckpointManager::getNumItemsForCursor return 1 when in fact there are no items available for the cursor. That had the result of causing the flusher to spin forever on the same vBucket; as we would incorrectly believe there were still items to flush.
The investigation under MB-28431 proved that in some as-yet unknown scenario, a cursor was positioned at the end of a single open checkpoint; yet the Cursor::offset was such that it was one less than CheckpointManager::numItems. This resulted in the value 1 being returned:
size_t CheckpointManager::getNumItemsForCursor_UNLOCKED( |
const std::string &name) const { |
size_t remains = 0; |
cursor_index::const_iterator it = connCursors.find(name);
|
if (it != connCursors.end()) { |
size_t offset = it->second.offset + getNumOfMetaItemsFromCursor(it->second); |
if (numItems >= offset) { |
remains = numItems - offset;
|
} else { |
LOG(EXTENSION_LOG_WARNING, ...
|
}
|
} else { |
LOG(EXTENSION_LOG_WARNING, ...
|
}
|
return remains; |
}
|
(Suspect this also affects older versions as well; not yet confirmed).
I've already tried to reproduce the problem (making it explicit by adding the following assert); but neither the unit tests, run-mats nor a manual local rebalance test demonstrate the issue, so the cause is not clear. (When investigating the live cluster, while the effect was seen the cause happened long previously).
Assert to make the problem explicit when occurs:
diff --git a/engines/ep/src/vbucket.cc b/engines/ep/src/vbucket.cc
|
index aad9cba7f..c7cc25c8a 100644
|
--- a/engines/ep/src/vbucket.cc
|
+++ b/engines/ep/src/vbucket.cc
|
@@ -343,6 +343,7 @@ VBucket::ItemsToFlush VBucket::getItemsToPersist(size_t approxLimit) {
|
// cursor, if we haven't yet hit the limit.
|
// Note that it is only valid to queue a complete checkpoint - this is where
|
// the "approx" in the limit comes from.
|
+ const auto itemsPreCkptMgr = result.items.size();
|
const auto ckptMgrLimit = approxLimit - result.items.size();
|
CheckpointManager::ItemsForCursor ckptItems;
|
if (ckptMgrLimit > 0) {
|
@@ -353,6 +354,14 @@ VBucket::ItemsToFlush VBucket::getItemsToPersist(size_t approxLimit) {
|
stats.persistenceCursorGetItemsHisto.add(
|
std::chrono::duration_cast<std::chrono::microseconds>(
|
ProcessClock::now() - _begin_));
|
+
|
+ // We were permitted at least one item for this cursor - therefore
|
+ // if zero items were returned then numItemsForCursor should return
|
+ // zero.
|
+ if (result.items.size() == itemsPreCkptMgr) {
|
+ cb_assert(checkpointManager->getNumItemsForCursor(
|
+ CheckpointManager::pCursorName) == 0);
|
+ }
|
} else {
|
// We haven't got sufficient remaining capacity to read items from
|
// CheckpoitnManager, therefore we must assume that there /could/ |
Attachments
Issue Links
- is triggering
-
MB-28431 rebalance stuck for more than 8 hours in ubuntu longevity
- Closed