Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
7.0.0
-
Untriaged
-
1
-
Unknown
Description
On test installation in aws prometheus consumed ~25G of memory and got OOM killed:
level=info ts=2021-07-15T10:18:46.826Z caller=compact.go:494 component=tsdb msg="write block" mint=1626307200000 maxt=1626328800000 ulid=01FAMTT0VNVMV2149RSNW0M4GW duration=821.47751ms
|
level=info ts=2021-07-15T10:18:47.407Z caller=compact.go:494 component=tsdb msg="write block" mint=1626328800000 maxt=1626336000000 ulid=01FAMTT1NAT3G7FPGE729R1PBR duration=580.965225ms
|
level=info ts=2021-07-15T10:18:47.416Z caller=db.go:1152 component=tsdb msg="Deleting obsolete block" block=01FAMTR51NKY7WPP56W2201MMV
|
level=info ts=2021-07-15T10:18:47.420Z caller=db.go:1152 component=tsdb msg="Deleting obsolete block" block=01FAMTR5VBSYVN4G5XEWFRXZPZ
|
level=info ts=2021-07-15T10:18:47.436Z caller=db.go:1152 component=tsdb msg="Deleting obsolete block" block=01FAMTR3GW7WWJGKNF37FDXK60
|
fatal error: runtime: out of memory
|
runtime stack:
|
runtime.throw(0x28d6767, 0x16)
|
/home/couchbase/jenkins/workspace/cbdeps-platform-build/deps/go1.14.2/src/runtime/panic.go:1116 +0x72
|
runtime.sysMap(0xc6cc000000, 0x4000000, 0x45aa2d8)
|
/home/couchbase/jenkins/workspace/cbdeps-platform-build/deps/go1.14.2/src/runtime/mem_linux.go:169 +0xc5
|
runtime.(*mheap).sysAlloc(0x45954a0, 0x400000, 0x45954a8, 0xb9)
|
/home/couchbase/jenkins/workspace/cbdeps-platform-build/deps/go1.14.2/src/runtime/malloc.go:715 +0x1cd
|
runtime.(*mheap).grow(0x45954a0, 0xb9, 0x0)
|
/home/couchbase/jenkins/workspace/cbdeps-platform-build/deps/go1.14.2/src/runtime/mheap.go:1286 +0x11c
|
runtime.(*mheap).allocSpan(0x45954a0, 0xb9, 0xfc10100, 0x45aa2e8, 0xc004f6bf28)
|
Considering the original reported issue, where memory grows to very high levels 25G) after shorter duration, disable pruning seemed to have contributed to a significant improvement. Considering customers will be upgrading from 6.6 and/or initial evaluation of 7.x would get a much better resource consumption experience relative to 6.6, where we had very high memory consumption due to stats.
This is not to say that we are satisfied with the current state. We'll continue to dig in and eliminate Prometheus continual memory growth to ensure it is capped. At this point the issue is clearly identified and acknowledged by Prometheus engineering. They have been working on the issue and provided a few patches, but have not merged them. We have experimented with these patches and determined they are still insufficient.
Given above status, and per maintenance meeting, we recommend not to introduce any delay for releasing 7.0.1 and release as scheduled (1st week of Sep).