Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.0.0, 7.0.1, 7.0.2
Component/s: ns_server
Labels:
- community-edition
- prometheus
Environment:
OS - Debian GNU/Linux 10
Couchbase Server 7.0.0-5302 (CE)

Operating System:
Ubuntu 64-bit
Flagged:

Impediment
Story Points:
1
Is this a Regression?:
Unknown

Description

We recently noticed issues with excessive resource consumption of some subsystems within Couchbase.

According to the images attached, Prometheus processes are consuming a lot of RAM and CPU memory, it is even the process that consumes the most resources within the virtual machine.

Doing a research, I noticed that Prometheus using in Couchbase 7.0 has version 2.22.0 (branch: HEAD, revision: a6239a377d49104ac7253a99aef8feb8dee0a7c2)

There are some bug reports that indicate high resource consumption and that some limit parameters are not being respected, according to the problem: https://github.com/prometheus/prometheus/issues/9744

Our first approach, as the issue suggests, is to update to version 2.22.1 where the bug is fixed, but since Couchbase uses a custom version of Prometheus, there is a custom flag that runs along with the parent process of Couchbase, you can see the error below when changing the Prometheus version:

Error parsing commandline arguments: unknown long flag '--storage.tsdb.no-lockfile'
prometheus: error: unknown long flag '--storage.tsdb.no-lockfile'

The version that Prometheus uses within Couchbase is different from the release in the official Prometheus repository, where:

prometheus, version 2.22.0 (branch: HEAD, revision: a6239a377d49104ac7253a99aef8feb8dee0a7c2) is the custom version of Couchbase

prometheus, version 2.22.0 (branch: HEAD, revision: 0a7fdd3b76960808c3a91d92267c3d815c1bc354) is the same version as Prometheus but without the custom flags.

The Workaround we got is to remove the Prometheus binary and restart the child process, this way the Prometheus binary doesn't load and doesn't overload the cluster, on the other hand, we lose all visibility of queries, index, and cluster activities that are important for operation.

Also attached is a screenshot of the process monitors with the high levels of RAM and CPU that the process consumes over time, causing unavailability in our environment.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

141854716-1c8ba08b-7af4-49b2-9128-7094e6680cc5.png
647 kB
13/Dec/21 4:41 PM
Captura de tela de 2021-12-13 21-15-46.png
14 kB
13/Dec/21 4:40 PM
Captura de tela de 2021-12-13 21-16-03.png
38 kB
13/Dec/21 4:40 PM
Captura de tela de 2021-12-13 21-43-26.png
74 kB
13/Dec/21 4:43 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Guilherme Saueressig

Reporter:: Diego Frazatto Pedroso

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Dec/21 4:43 PM

Updated:: 07/Mar/22 4:33 AM

Resolved:: 15/Dec/21 8:12 PM

Time Tracking

Estimated:

32h

Remaining:

32h

Logged:

Not Specified

Gerrit Reviews

There are no open Gerrit changes

[bug] Prometheus metrics using excess RAM from node

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Time Tracking

Gerrit Reviews

PagerDuty