Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 5.0.0
Affects Version/s: 4.5.0, 4.5.1, 4.6.0
Component/s: secondary-index
Labels:
None

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
Last Collect on Nodes:
http://qa.sc.couchbase.com/job/sequoia-cbcollectinfo-linux/26/artifact/logs/

History of indexer collects:
+23hrs
https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-26T073644-ns_1%40172.23.105.60.zip

+46hrs
https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-26T185512-ns_1%40172.23.105.60.zip

+51hrs
https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-27T112907-ns_1%40172.23.105.60.zip

+57hrs (day 2 @7pm)
https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-27T191251-ns_1%40172.23.105.60.zip

+96hrs (Day 4 @9pm)
https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-29T210217-ns_1%40172.23.105.60.zip
    * this is same collect with other nodes here
    * http://qa.sc.couchbase.com/job/sequoia-cbcollectinfo-linux/26/artifact/logs

Show
Last Collect on Nodes: http://qa.sc.couchbase.com/job/sequoia-cbcollectinfo-linux/26/artifact/logs/ History of indexer collects: +23hrs https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-26T073644-ns_1%40172.23.105.60.zip +46hrs https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-26T185512-ns_1%40172.23.105.60.zip +51hrs https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-27T112907-ns_1%40172.23.105.60.zip +57hrs (day 2 @7pm) https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-27T191251-ns_1%40172.23.105.60.zip +96hrs (Day 4 @9pm) https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-29T210217-ns_1%40172.23.105.60.zip     * this is same collect with other nodes here     * http://qa.sc.couchbase.com/job/sequoia-cbcollectinfo-linux/26/artifact/logs
Is this a Regression?:
No

Description

On day 4 into longevity test, the indexer warned that 92% of memory was being used and all indexes entered into Paused state. Could be a leak somewhere because the usage increased each day and never seems to have gone down. Also, the item count levels out at ~15M on the bucket with a mixed workload of sets/gets/deletes, so I would expect indexer memory to also be fairly constant.

Node has 30GB memory, and quota was 75%. The 92% warning here:

[user:info,2016-10-28T18:05:07.684-07:00,ns_1@172.23.105.60:<0.7983.373>:menelaus_web_alerts_srv:global_alert:89]Warning: approaching max index RAM. Indexer RAM on node "172.23.105.60" is 92%, which is at or above the threshold of 75%.

[ns_server:info,2016-10-28T18:05:07.685-07:00,ns_1@172.23.105.60:ns_log<0.1821.0>:ns_log:handle_cast:188]suppressing duplicate log menelaus_web_alerts_srv:undefined([<<"Warning: approaching max index RAM. Indexer RAM on node \"172.23.105.60\" is 92%, which is at or above the threshold of 75%.">>]) because it's been seen 17 times in the past 50.99999 secs (last seen 2.995559 secs ago

Then Indexes are paused

2016-10-28T16:06:30.180-07:00 [Info] Indexer::monitorMemUsage ManualGC Time Taken 967.896727ms

2016-10-28T16:06:30.229-07:00 [Info] Indexer::ReadMemstats Time Taken 5.223638ms

2016-10-28T16:06:30.229-07:00 [Info] Indexer::monitorMemUsage MemoryUsed Total 15062880256 Idle 65536

2016-10-28T16:06:30.229-07:00 [Info] Indexer::handleIndexerPause

2016-10-28T16:06:30.230-07:00 [Info] ClustMgr:handleSetLocalValue Key IndexerState Value Paused

2016-10-28T16:06:30.236-07:00 [Info] Indexer::handleIndexerPause Indexer State Changed to Paused

2016-10-28T16:06:30.236-07:00 [Info] Timekeeper::handleIndexerPause

2016-10-28T16:06:30.237-07:00 [Info] MutationStreamReader::handleIndexerPause

2016-10-28T16:06:30.237-07:00 [Info] MutationMgr::handleIndexerPause Stream MAINT_STREAM Paused

I restarted the indexer and memory usage went down to 7GB and all indexes were active again.

Snippets are from: https://s3.amazonaws.com/scalability-mcafee/collectinfo-2016-10-29T210217-ns_1%40172.23.105.60.zip
have also attached here logs from other nodes and history of collects of indexer throughout the run for tracing memory usage.

*Regression unkown as MOI is first time being run in longevity.

Attachments

Issue Links

relates to

MB-21573 Backport MB-21552 to 4.6.0 - MOI memory steady increasing with constant number of items

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...

Activity

People

Assignee:: Sarath Lakshman

Reporter:: Tommie McAfee (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 02/Nov/16 6:59 AM

Updated:: 05/Dec/17 1:45 PM

Resolved:: 09/Nov/16 11:01 AM