Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.2.6
Affects Version/s: 7.2.5, 7.6.1
Component/s: storage-engine
Labels:

Triage:
Triaged
Story Points:
0
Is this a Regression?:
No

Description

Flush stats (fdSz and hdrSz) are under accounted in newPgOperator for complex pages with merge deltas. This discrepancy leads to inaccurate calculations(over-accounting) of FlushDataSz and FlushHdrSz in memory.

Consequently, LSS cleaners compute fragmentation incorrectly and run less frequently, causing accumulation of stale data on disk and resulting in disk bloat, particularly evident in workloads with frequent merges (e.g: timeseries data). A slow mutation rate aggravates the issue.

Example:

After the merge delta is added, if we have a parent page like:

"low:":         <ud>(key-       401, sn:2, insert:true)</ud> (len:44),

"high:":        maxItem (len:7),

"chainLen:":    3,

"numItems:":    0,

"state:":       8006,

"version:":     6,

"flushed:":     true,

"evicted:":     false,

"compressed:":  false

 0 merge: op compress[false]purge[false]empty[false]op[opPageMergeDelta]

     0 delta: op compress[false]purge[false]empty[false]op[opPageRemoveDelta] ptr[0x10eb4c000]

     1 flush: op compress[false]purge[false]empty[false]op[opRelocPageDelta] NumRecords 0 NumSegments 1 bloomFilter: <nil> flushDataSz: 43, flushHdrSz:80

     2 base:

 1 flush: op compress[false]purge[false]empty[false]op[opRelocPageDelta] NumRecords 0 NumSegments 1 bloomFilter: <nil> flushDataSz: 53, flushHdrSz:124

 2 base:

After compaction, we'd expect the staleDataSz to be 43+53=96 and staleHdrSz to be 80+124=204
After compaction the page becomes,
Plasma:

"low:":         <ud>(key-       401, sn:2, insert:true)</ud> (len:44),

"high:":        maxItem (len:7),

"chainLen:":    0,

"numItems:":    0,

"state:":       7,

"version:":     7,

"flushed:":     false,

"evicted:":     false,

"compressed:":  false

 0 base:

But the staleDataSz returned is 43 and staleHdrSz returned is 80 .
This causes us to under-subtract the flush stats which eventually leads to a over-counting of these stats in memory.
The stats persisted on disk are correct. Because of this recovery is able to correct the situation.

Workaround:
Recovery log blocks correctly persist flushDataSz and flushHdrSz without issues. During recovery, the in-memory stats FlushDataSz and FlushHdrSz are recomputed. Restarting the indexer process serves as a temporary fix.

Attachments

Issue Links

Clones

MB-63065 [BP-7.6.3] Under-accounting of flush stats in case complex page iterators

Resolved

is a backport of

MB-62837 Under-accounting of flush stats in case complex page iterators

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Jinesh Parakh

Reporter:: Jinesh Parakh

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Aug/24 8:45 AM

Updated:: 29/Aug/24 9:22 AM

Resolved:: 12/Aug/24 11:05 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-63130 [BP 7.2.6]: Avoid under-counting flush stats in complex page iterator: Gerrit Review:

[BP-7.2.6] Under-accounting of flush stats in case complex page iterators

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty