Error after header write of chunked response on prometheus fetch results > 4K

Description

Observed in logs.

In a recent system test run, CBAS is generating a WARNing on every prometheus fetch, where the result is > 4K. There are over 9900 instances of this warning in the logs.

Also, the result is returning a 500 to ns_server- I do not know if ns_server successfully scrapes otherwise well-formed results w/ a 500 status{}- if they do not scrape metrics w/ a 500 error, the issue is even more severe than the WARN spam in the log.

EDIT: I can confirm from promtimer that all analytics stats are missing when the 500 is returned for ns_server; so this is a pretty severe issue.

 

 

Issue

Resolution

When the Prometheus stats returned from Analytics exceeded four kilobytes, the status code was inadvertently set to 500 (Internal Error), and this resulted in a large number of warnings in the Analytics warning log. Couchbase Server discarded these statistics.

This has been fixed to properly return a 200 (OK) status code when the size of Prometheus stats exceeds 4KiB, allowing these stats to be recorded properly. The warning is not displayed.

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

CB robot August 1, 2023 at 6:25 PM

Build couchbase-server-8.0.0-1363 contains cbas-core commit 14be6bc with commit message:
: ensure 200 status prior to writing header on stats > 4KiB

CB robot August 1, 2023 at 6:25 PM

Build couchbase-server-8.0.0-1363 contains cbas-core commit 78e404b with commit message:
: merge branch '7.1.x' into 'neo'

Michael Blow July 18, 2023 at 1:06 PM

, for release notes:

Issue:
The Prometheus stats returned from Analytics increase in size proportionally to the number of scopes & the the length of the scope names. If the total size of these stats exceeds 4KiB, the status code would inadvertently be set to 500 (Internal Error), causing the following warning in the Analytics warning log, and Couchbase Server to discard these statistics:
e.g.

2023-06-23T03:50:37.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-0] Error after header write of chunked response

Fix:
The implementation was fixed in 7.1.5 to properly return a 200 (OK) status code when the size of Prometheus stats exceeds 4KiB, allowing these stats to be recorded properly, and the warning to not be emitted.

Balakumaran Gopal July 17, 2023 at 7:03 AM

We have longevity running for almost 4.5 days now on 7.1.5-3876. Marking this closed.

Michael Blow July 11, 2023 at 12:54 PM

Do we have a run of 7.1.5 system tests for the test case reported in on 7.1.5-3862 or later? We can use this to verify this fix.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 26, 2023 at 10:19 PM
Updated September 18, 2023 at 1:50 PM
Resolved June 27, 2023 at 12:11 AM
Instabug