Description
What's the issue?
Whilst investigating a semi-related 'cbbackupmgr' performance issue we noticed that the v9 release of gocbcore is sending buffer acknowledgment packets fair too frequently. I've had a brief look over the code and it looks like commit 2cd66ef made it so that if the DCP queue is empty a buffer acknowledgement is force sent by default. After profiling 'cbbackupmgr' I noticed that we were spending a decent amount of CPU time in the 'maybeSendDcpBufferAck' function. I've attached a CPU profile of the affect of this change before/after I reverted it locally.
What kind of effect is this having on 'cbbackupmgr'?
We are seeing a not-insignificant drop in throughput due to this issue. We have some performance testing on GCE before and after reverting this commit.
6.6.0 Backup (blackhole)
|
15m18.88s @ 400.82 MB/s
|
15m9.01s @ 405.17 MB/s
|
15m19s @ 400.45 MB/s
|
|
6.6.0 Backup (blackhole with buffer ack revert)
|
14m33.80s @ 421.50 MB/s
|
14m54.10s @ 411.93 MB/s
|
14m48.06s @ 414.73 MB/s
|
I've also done some smaller scale performance testing locally that also show this drop in throughput:
Before revert - Copied all data in 1m45.244825245s (Avg. 100.06MB/Sec) - 184%
|
After revert - Copied all data in 1m24.753172343s (Avg. 125.07MB/Sec) - 190%
|
I've also attached two PCAPs before/after the revert of 2cd66ef. We can clearly see that in the first PCAP we are sending ~14k buffer ack packets whereas in the second we are only sending 5.
It's worth noting that even though these performance tests were not writing to disk, all caches were dropped in-between each test to ensure as consistent results as possible.
Attachments
Issue Links
- relates to
-
MB-40525 cbbackupmgr performance regression - 27% slower on leto
- Resolved