Details
-
Bug
-
Resolution: Fixed
-
Major
-
Goldfish Private Preview
-
Untriaged
-
0
-
Unknown
-
Analytics Sprint 31
Description
After changing the I/O scheduler to AsynchronousScheduler as in MB-59661, we are still hitting the following issue during the merge operation:
2023-11-16T13:04:50.031+00:00 ERRO CBAS.impls.LSMHarness [Executor-1422:418f33f67f1db6c11b01f09689a9950e] MERGE operation failed on {"class" : "LSMColumnBTree", "dir" : "/cache/data/@analytics/v_iodevice_1/storage/partition_49/Default/Default/WwyyPKKuZ5poJ/0/WwyyPKKuZ5poJ", "memory" : [{"class":"LSMBTreeMemoryComponent", "state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[47,47]"}, {"class":"LSMBTreeMemoryComponent", "state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[48,48]"}], "disk" : 5, "num-scheduled-flushes":0, "current-memory-component":1}
|
org.apache.hyracks.api.exceptions.HyracksDataException: Unable to find free page in buffer cache after 1000 cycles (buffer cache undersized?)
|
The buffer cache was pressured while performing concurrent merges. Ideally, we should reduce the resources (more specifically the number of pages ) required from the buffer cache to perform merge operations, which is tracked in MB-59664. In the meantime, we need to tune the merge policy parameters, namely the number of concurrent merges (STORAGE_MAX_CONCURRENT_MERGES_PER_PARTITION) and/or the number of the number of components being merge in a single merge operation (MAX_MERGE_COMPONENT_COUNT) to relieve the pressure on the buffer cache.