Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Untriaged
-
1
-
Unknown
Description
Looks like when time suddenly changes for prometheus (this includes OS suspend-resume, or laptop sleep-wake-up), it starts reporting "out of bound" error for all targets:
level=debug ts=2021-02-17T16:58:00.252Z caller=scrape.go:1127 component="scrape manager" scrape_pool=general target=http://127.0.0.1:8094/_prometheusMetrics msg="Append failed" err="out of bounds"
|
level=warn ts=2021-02-17T16:58:00.252Z caller=scrape.go:1133 component="scrape manager" scrape_pool=general target=http://127.0.0.1:8094/_prometheusMetrics msg="Append failed" err="out of bounds"
|
level=warn ts=2021-02-17T16:58:00.252Z caller=scrape.go:1082 component="scrape manager" scrape_pool=general target=http://127.0.0.1:8094/_prometheusMetrics msg="Appending scrape report failed" err="out of bounds"
|
level=debug ts=2021-02-17T16:58:01.779Z caller=scrape.go:1412 component="scrape manager" scrape_pool=general target=http://127.0.0.1:8091/_prometheusMetrics msg="Out of bounds metric" series="audit_queue_length\{category=\"audit\"}"
|
level=debug ts=2021-02-17T16:58:01.779Z caller=scrape.go:1412 component="scrape manager" scrape_pool=general target=http://127.0.0.1:8091/_prometheusMetrics msg="Out of bounds metric" series="audit_unsuccessful_retries\{category=\"audit\"}"
|
|
In my case not every scrape fails but rather every other scrape or so.
This can lead at least to holes in stats data (for sure) and possibly higher cpu load (my guess).
There is a github issue for that: https://github.com/prometheus/prometheus/issues/8243
It's very unlikely that it will be fixed by the prometheus team, so we probably should handle it by ourselves.
The most obvious way to fix it is to detect time changes and restart the prometheus process. It at least should help with forward time jumps.
Possibly related: https://github.com/golang/go/issues/35012
Attachments
Issue Links
For Gerrit Dashboard: MB-44510 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
148889,2 | MB-44510: Update prometheus to 2.22-2 | master | tlm | Status: MERGED | +2 | +1 |