Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Untriaged
-
-
1
-
Yes
Description
Build : 7.1.0-2147
Test : -test tests/integration/neo/test_neo_magma_milestone4.yml -scope tests/integration/neo/scope_neo_magma.yml
Scale : 3
Iteration : 1st
- In this system test, we have 4 magma buckets on which there are 31 collections each.
- We run a workload to insert json docs to all these buckets sequentially, on all collections.
- bucket6 and bucket7 have 1197MB memory quota each.
- The doc loader is designed to keep on loading docs in batches, and poll for the active items resident ratio. If it is less than the targeted (80% in this case), then doc loading should stop.
- For bucket6, 1050000 docs were loaded per collection, which is 32.55M for the whole bucket before hitting 80% RR.
- Expectation is that for bucket7, since the memory quota is the same, when hitting 80% RR, number of docs should be around the number as bucket6.
For bucket7, by the time active resident ratio hit 80%, 4038000 docs got loaded per collection, which is ~125M for the bucket.
Could this be because of inaccurate computation of RR? Due to this excessive loading, we see memcached OOM kills and general instability in the cluster due to sizing issues.
Doc loading for bucket6 started at around 2022-01-25T13:28:43-08:00 and went on till 2022-01-25T14:38:21-08:00
Doc loading for bucket7 started at around 2022-01-25T14:38:21-08:00 and went on till 2022-01-25T20:29:01-08:00
From the test console
[2022-01-25T13:28:43-08:00, sequoiatools/catapult_dgm:d9a36a] -i 172.23.97.74 -r 80 -u Administrator -p password -n 6000 -b bucket6 -dt Hotel -pc 100 -ds 1000 -ac True --num_threads 4
|
[2022-01-25T14:38:21-08:00, sequoiatools/catapult_dgm:723a04] -i 172.23.97.74 -r 80 -u Administrator -p password -n 6000 -b bucket7 -dt Hotel -pc 100 -ds 1000 -ac True --num_threads 4
|
[2022-01-25T20:29:01-08:00, sequoiatools/couchbase-cli:30e82c] bucket-edit -c 172.23.97.74 -u Administrator -p password --bucket bucket7 --max-ttl 3600
|
In between, we also see memcached process getting OOM killed on various KV nodes :
On 172.23.97.241
[user:info,2022-01-25T19:00:16.581-08:00,ns_1@172.23.97.241:<0.25496.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
|
On 172.23.97.74
[user:info,2022-01-25T19:55:09.470-08:00,ns_1@172.23.97.74:<0.25013.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
|
On 172.23.96.122
[user:info,2022-01-25T20:27:25.611-08:00,ns_1@172.23.96.122:<0.25938.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
|
On 172.23.96.48
[user:info,2022-01-25T21:02:25.342-08:00,ns_1@172.23.96.48:<0.25739.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
|
Logs attached were collected around 2022-01-25 21:45 PST. Let me know if you need logs from an earlier timestamp.
Nodes with KV service : 172.23.120.73, 172.23.120.74, 172.23.120.77, 172.23.120.86, 172.23.121.77, 172.23.123.25, 172.23.123.26, 172.23.96.122, 172.23.96.14, 172.23.96.48, 172.23.97.241, 172.23.97.74
Attachments
Issue Links
- is caused by
-
MB-50546 Folly::UMPMCQueue leads to incorrect bucket memory usage tracking
- Closed