Details
-
Bug
-
Resolution: Unresolved
-
Major
-
7.6.0, 7.0.0, 7.1.0, 7.2.0
-
None
-
Untriaged
-
0
-
No
Description
Performing a SET with a 18-20 MiB document on a bucket with 100 MiB of quota is prone to perpetual tmpOOM/temporary failure status.
This is because of the mutation_mem_ratio/mutation_mem_threshold (renamed in 7.6.0) and it's effect on tmpOOMs.
The mutation_mem_ratio is set to 93% of the bucket quota. We use it to decide whether the bucket is "too full" and should report temporary failure (tmpOOM).
The logic in KV on the SET path is:
if (memUsed + itemSize < mutation_mem_ratio * quota) |
return temporary_failure; |
A large document will always fail to insert at a low bucket quota when the memory usage is within the typical level between the low and high watermarks. Here's why:
KV will stop freeing memory at the low watermark.
The difference between the low watermark (75%) and the mutation_mem_ratio (93%) is 18% of the bucket quota. Assuming 100 MiB quota, that is 18 MiB. An operation on a document larger than 18 MiB will then fail.
We can easily trigger a failure loop:
1. Load travel-sample
2. Set quota to 100 MiB (mem_used should be around 75 MiB)
3. Try to store a 19 MiB document
4. Observe temporary failure and the kv_ep_tmp_oom_errors stat go up
Workaround
Increase bucket quota to 120 MiB to ensure the operation can be processed.