Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.2.5, 7.6.2
Affects Version/s: 7.6.0, 7.1.4, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.1.7, 7.2.2, 7.1.6, 7.2.3, 7.2.5, 7.2.6, 7.6.1
Component/s: storage-engine
Labels:

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

Spinning out magma specific issue from ~~MB-61376~~ (KV will still move stat "all" processing off worker thread)

Adding some trace spans and trying to force long checkpoints (big flush times) and I see that the "all" stats group is spending most of its time in MagmaKVStore::getAggrDbFileInfo which uses Magma::getStats

E.g. here's a slow op, i've formatted the trace output and I also (ab)used the some existing trace identifiers just for debug purposes - thus in the following output disassociate_bucket is the wrapped around the call to kvBucket->getFileStats which just gets the file stats via getAggrDbFileInfo

2024-04-03T11:07:15.136579+01:00 WARNING 282: Slow operation: {"bucket":"default","cid":"[::1]:59885/5c1496f8","command":"STAT","duration":"1476 ms","packet":{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientRequest","opaque":1544853240,"opcode":"STAT","vbucket":0},"peer":{"ip":"::1","port":59885},

"trace":

  "throttled=1108939726321923:38

   sync_write.prepare=1108939726360892:747

   associate_bucket=1108939727109441:8

   disassociate_bucket=1108939727118119:1473652 <-  timing of kvBucket->getFileStats(collector);

   storage_engine_stats=1108941201052251:929

   update_privilege_context=1108939726321847:1475662 <- timing of  EventuallyPersistentEngine::doEngineStatsLowCardinality

   set.with.meta=1108941201985030:542

   get.stats=1108939726317598:1476210

   execute=1108939726308747:1476390

   request=1108939726308747:1476390",

"worker_tid":4867}

In order to reproduce this issue I chose the following approach.

Single node (cluster_run) with few vbuckets, e.g. 16, 6GB quota (lots of queueing space)
Add a sleep to the ep-engine flusher (so that we gather lots of data in memory before flushing)
- 20s sleep for example
Run pillowfight 100% write
In a loop run cbstats all

After a few minutes plenty of slow STAT calls were logged and with extra tracing, all showed time was in getAggrDbFileInfo.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-debugging-Add-stats-latency-printing-thread-in-magma.patch
03/Apr/24 10:51 AM
3 kB
Sarath Lakshman

Issue Links

is triggering

MB-61376 ep-engine may lock the worker thread during stats call

Closed

relates to

MB-60276 "All" stats group goes to disk

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-61394
#	Subject	Branch	Project	Status	CR	V
208066,5	MB-61394 magma: Release activeKVStoresMutex while doing WAL::Trim	master	magma	Status: MERGED	+2	+1
208069,5	MB-61394 magma: Add GetStatsTimeHisto histogram to track GetStats time	master	magma	Status: MERGED	+2	+1
208107,4	MB-61394 [BP] magma: Release activeKVStoresMutex while doing WAL::Trim	neo	magma	Status: MERGED	+2	+1
208108,2	MB-61394 [BP] magma: Release activeKVStoresMutex while doing WAL::Trim	trinity	magma	Status: MERGED	+2	+1
208222,4	MB-61394: Add getStatsTime histogram for Magma::GetStats timings	master	kv_engine	Status: MERGED	+2	+1
208260,2	MB-61394 magma: Fix clang analyzer warning in BlockIterator	neo	magma	Status: MERGED	+2	+1
208298,2	MB-61394 [BP] magma: Add GetStatsTimeHisto histogram to track GetStats time	trinity	magma	Status: MERGED	+2	+1
208322,3	MB-61394: BP Add getStatsTime histogram for Magma::GetStats timings	trinity	kv_engine	Status: MERGED	+2	+1
208508,2	MB-61394: BP Add getStatsTime histogram for Magma::GetStats timings	neo	kv_engine	Status: ABANDONED	0	-1
208509,1	MB-61394 [BP] magma: Add GetStatsTimeHisto histogram to track GetStats time	neo	magma	Status: ABANDONED	+2	+1
208588,1	Merge branch 'trinity' into 'master'	master	kv_engine	Status: MERGED	+2	+1