Details
-
Bug
-
Resolution: Fixed
-
Major
-
5.0.0
-
Hera cluster
-
Untriaged
-
Centos 64-bit
-
Yes
Description
Let me start with a brief summary of results for Q1 and Q2 not_bounded queries. Note, I was using MOI for Q2 tests.
Q1 | 1.4.2 GOGC=100 | 1.4.2 GOGC=200 | 1.7.1 GOGC=100 | 1.7.1 GOGC=200 | 1.7.3 GOGC=100 | Master GOGC=100 | Master GOGC=200 |
---|---|---|---|---|---|---|---|
Throughput | 25K | 28K | 8K | 22K | 8K | 13K | 26K |
RSS, MB | 145 | 200 | 65 | 160 | 65 | 115 | 170 |
Q2 | 1.4.2 GOGC=100 | 1.4.2 GOGC=200 | 1.7.1 GOGC=100 | 1.7.1 GOGC=200 | 1.7.3 GOGC=100 | Master GOGC=100 | Master GOGC=200 |
Throughput | 22K | 24K | 19K | 22K | 19K | 30K | 34K |
RSS, MB | 350 | 475 | 300 | 375 | 315 | 275 | 395 |
I will focus on the most severe regression - Q1 after upgrade to Go 1.7.
Go 1.7 significantly reduces memory footprint. Unfortunately, that improvement had a very negative impact on the query engine performance:
- Live heap size is very small now - 14-16MB.
- The corresponding goal heap size is 30-35MB (GOGC=100).
- A small live heap size and a huge amount of garbage result into extremely frequent garbage collection events. According to my measurements, ~175 GC events per second happen during the Q1 workload.
- Sweep termination and mark termination are still stop-the-world phases. The query engine ends up spending about 20% of time in STW GC. That is a huge overhead.
- The throughput is just 7-8K queries per second.
Now the funny part. What if I run Q2 workload right before Q1 workload?
- Q2 increases live heap size to ~150MB.
- GC frequency decreases to ~20 events per second.
- The query engine spends only 2-3% of time in STW GC.
- Q1 throughput increases to 26K queries per second.
In a way, Q2 before Q1 is equivalent go applying custom GOGC settings.
Upgrade to Go 1.7 is inevitable. We can tune GOGC, we can tune number of "servicers".
That said, we really need to optimize the memory allocation.