Loading...

XML

Word

Printable

Steps to reproduce:

Create 4 buckets
Create indexes with replicas on each of the 4 buckets.
Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
Run a shell script that runs the request_plus scans continuously.
Run stress-ng with the params -> stress-ng --vm 4 --vm-bytes 1G --metrics-brief --vm-keep --vm-locked -m 4 --aggressive --vm-populate. (Adjust the --vm-bytes param depending upon the VM resources)
Once you run enough stress-ng processes, OOM kill will kick in. This can be verified by checking the dmesg ( dmesg -T | egrep -i 'killed process' )
There's a possibility that stress-ng gets spawned and killed since OOM kill is determined by a oom_score_adj factor. In order to make sure that memcached gets killed run this

echo 1000 > /proc/<memcached PID>/oom_score_adj

Logs and pcaps are attached. There were 2 instances of rollbacks observed. Please use these timestamps for analysis

Instance 1

Index node1 ( 172.23.106.159)

2022-07-26T03:26:54.738-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

Index node2 ( 172.23.106.163)

2022-07-26T03:26:58.186-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

Instance 2

Index node1 ( 172.23.106.159)

2022-07-26T05:06:12.658-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

 Index node2 ( 172.23.106.163)

2022-07-26T05:06:10.805-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

Log bundles ->

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.105.36.zip

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.105.37.zip

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.156.zip

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.159.zip

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.163.zip

s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.204.zip

I have the packet capture files too and can attach them if necessary.

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

There are no open Gerrit changes

Index rollbackAllToZero messages seen