Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
6.6.2
-
Untriaged
-
1
-
Unknown
Description
Steps to reproduce:
- Create 4 buckets
- Create indexes with replicas on each of the 4 buckets.
- Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
- Run a shell script that runs the request_plus scans continuously.
- Run stress-ng with the params:
stress-ng --vm 4 --vm-bytes 1G --metrics-brief --vm-keep --vm-locked -m 4 --aggressive --vm-populate
(Adjust the --vm-bytes param depending upon the VM resources)
- Once you run enough stress-ng processes, OOM kill will kick in. This can be verified by checking the dmesg ( dmesg -T | egrep -i 'killed process' )
- There's a possibility that stress-ng gets spawned and killed since OOM kill is determined by a oom_score_adj factor. In order to make sure that memcached gets killed run this
echo 1000 > /proc/<memcached PID>/oom_score_adj |
Rollbacks to zero seen
72.23.100.19 : index |
2022-07-28T06:01:49.498-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test4 |
2022-07-28T06:01:52.344-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test3 |
cbcollect logs ->
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.15.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.16.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.17.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.19.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.22.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.121.215.zip |
The cluster wasn't in a healthy state as one of the nodes had very high memory consumption and a rebalance did not work. Please look around the timestamp 2022-07-28T06:01:49.498-07:00 (this is the only occurrence so shouldn't be confusing).
Not clear if it helps but some of the older logs are here ->
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.15.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.16.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.17.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.19.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.22.zip |
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.121.215.zip |