Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.1.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
Environment:
Centos 7 64 bit; Couchbase EE 7.0.0-4532

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
http://supportal.couchbase.com/snapshot/4114f3509fbd5226659d2bdc64c2c61a::4
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.233.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.236.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.238.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.251.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.121.78.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.105.175.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.250.zip
s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.121.74.zip

Show
http://supportal.couchbase.com/snapshot/4114f3509fbd5226659d2bdc64c2c61a::4 s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.233.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.236.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.238.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.251.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.121.78.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.105.175.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.106.250.zip s3://cb-customers-secure/high_mem_at_10s_scrape_interval/2021-03-01/collectinfo-2021-03-01t081441-ns_1@172.23.121.74.zip
Story Points:
1
Is this a Regression?:
No

Description

Summary:
This may or may not be a bug, but it would be good to understand the issue on why the RAM usage on .74 node remained high (after waiting sufficiently long) after increasing the scrape_interval and scrape_timeout to 10s (from earlier 1s that was set)

The cluster contains 30 buckets, 500 indexes x 2 replicas, close to 1000 collections, approx 1000 scopes and 15 XDCR replications. The scrape interval was set to 1s at some point during the volume test, and when we set it back to 10s it continued to remain at 95%

Some consequences (as a result of the high RAM on .74 node probably):
1. XDCR on .74 node shows up as "No XDCR setup)
2. _prometheusMetrics endpoint on .74 node does not return any metrics and gets stuck when we call the endpoint. This can be seen on the targets page of Prometheus. (On other nodes the endpoint of xdcr metrics returns metrics appropriately)

See and for RAM usage before resetting the prometheus settings.

Also noticed lot of REST calls where timing out on UI. See because of which XDCR replications were not visible on UI.

Also, here are the logs for when the scrape_interval was at 1s (ie; before it was increased back to 10s)
http://supportal.couchbase.com/snapshot/4114f3509fbd5226659d2bdc64c2c61a::3

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

prometheus_targets.png
451 kB
01/Mar/21 1:09 AM
Screen Shot 2021-03-01 at 1.05.32 PM.png
451 kB
01/Mar/21 1:27 AM
Screen Shot 2021-03-01 at 11.12.28 AM.png
2.36 MB
01/Mar/21 1:19 AM
Screen Shot 2021-03-01 at 11.12.43 AM.png
2.69 MB
01/Mar/21 1:19 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Sumedh Basarkod (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 01/Mar/21 1:13 AM

Updated:: 17/Jan/22 11:39 PM

Resolved:: 26/Dec/21 11:39 PM

Gerrit Reviews

There are no open Gerrit changes

High Memory usage even after increasing the scrape interval to 10s

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty