Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: 6.6.2
Component/s: secondary-index
Labels:
- approved-for-7.1.2

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

Steps to reproduce:

Create 4 buckets
Create indexes with replicas on each of the 4 buckets.
Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
Run a shell script that runs the request_plus scans continuously.

Run stress-ng with the params:

stress-ng --vm 4 --vm-bytes 1G --metrics-brief --vm-keep --vm-locked -m 4 --aggressive --vm-populate

(Adjust the --vm-bytes param depending upon the VM resources)

Once you run enough stress-ng processes, OOM kill will kick in. This can be verified by checking the dmesg ( dmesg -T | egrep -i 'killed process' )
There's a possibility that stress-ng gets spawned and killed since OOM kill is determined by a oom_score_adj factor. In order to make sure that memcached gets killed run this

echo 1000 > /proc/<memcached PID>/oom_score_adj

Rollbacks to zero seen

72.23.100.19 : index

2022-07-28T06:01:49.498-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test4

2022-07-28T06:01:52.344-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test3

cbcollect logs ->

    url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.15.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.16.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.17.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.19.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.22.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.121.215.zip

The cluster wasn't in a healthy state as one of the nodes had very high memory consumption and a rebalance did not work. Please look around the timestamp 2022-07-28T06:01:49.498-07:00 (this is the only occurrence so shouldn't be confusing).

Not clear if it helps but some of the older logs are here ->

     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.15.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.16.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.17.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.19.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.22.zip

         url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.121.215.zip

Attachments

Issue Links

duplicates

MB-53183 [6.6.5 build 10106] - Secondary Index rollback to zero after KV node auto failover

Resolved

MB-53221 RollbackAll to zero with need an older snapshot error seen in the indexer logs

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Pavan PB

Reporter:: Pavan PB

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Due:: 05/Aug/22

Created:: 28/Jul/22 8:16 AM

Updated:: 04/Oct/22 6:49 PM

Resolved:: 04/Oct/22 6:49 PM

Gerrit Reviews

There are no open Gerrit changes

RollbackAll to zero seen in the indexer logs

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty