Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.1.0
Affects Version/s: 4.0.0
Component/s: couchbase-bucket
Security Level: Public
Labels:
- performance

Triage:
Untriaged
Operating System:
Windows 64-bit
Is this a Regression?:
Unknown

Description

The problem:

On Windows, on a one-node-only KV cluster, Sherlock RC4 seems to be 7x slower than 3.1.0 for 100% set ops workload. The two versions are run on the same Windows setup Hermes.

On Linux, Sherlock is fine.

Background:

The kv_fragmentation_revAB_100K.test (seems to) attempts to update 100K documents for multiple iterations. The test's intention may be to find memory issues, but this bug is NOT about memory fragmentation. Instead, I think the problem lies with the speed at which frontend KV happens.

It is suffice to know that we are doing 100% set ops. And identical workload is applied to both 3.1.0 and Sherlock RC4.

The evidence, ops stats:

For Sherlock, see that the attached windows_sherlock_rc4_revAB_ops.png is around 4K ops for Sherlock. Notice it shows 100% set ops. A recent run of Sherlock is at http://ci.sc.couchbase.com/job/hermes/670/console and it is still running at the moment after 20 hours. It would finish between 3 ~ 6 hours normally.

For 3.1.0, here is a recent run,

http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hermes_310-1776_2e1_load

from

http://ci.sc.couchbase.com/job/hermes/668/

and it would say cmd_set and ops both around 30K.

The evidence, iteration time:

Test performs multiple iterations. We can gauge the runtime base on the time between iterations.

For 3.1.0,

[26/Aug/2015 17:46:33] INFO - Iteration 2/6
[26/Aug/2015 18:46:35] INFO - Iteration 3/6

about an hour.

For Sherlock RC4 on LINUX,

[20/Aug/2015 00:19:47] INFO - Iteration 2/6
[20/Aug/2015 00:52:01] INFO - Iteration 3/6
[20/Aug/2015 01:23:47] INFO - Iteration 4/6
[20/Aug/2015 01:56:47] INFO - Iteration 5/6

roughly half an hour for each iteration. http://ci.sc.couchbase.com/job/ares/1993/console

For Sherlock RC4 on WINDOWS,

[27/Aug/2015 02:51:21] INFO - Iteration 1/6
... (lines omitted)
[27/Aug/2015 09:12:26] INFO - Iteration 2/6
[27/Aug/2015 16:30:36] INFO - Iteration 3/6

roughly 7 hours each.

The difference? If we compare between 3.1.0 and Sherlock on Windows only: iteration time is 7 hours vs 1 hour. And 30K ops vs. 4K ops.

How about memory fragmentation?

If you paid attention to the timestamp and the graphs and the symptom, the speed difference is there right away. I don't think there could be enough memory fragmentation induced yet. I also checked Sherlock's memory fragmentation manually tonight:

{'bytes': '6542377632', 'ep_blob_num': '101254', 'ep_blob_overhead': '248466733', 'ep_item_num': '8235', 'ep_kv_size': '6491269920', 'ep_max_size': '42949672960', 'ep_mem_high_wat': '36507222016', 'ep_mem_low_wat': '32212254720', 'ep_mem_tracker_enabled': 'true', 'ep_oom_errors': '0', 'ep_overhead': '30930130', 'ep_storedval_num': '100000', 'ep_storedval_overhead': '248466733', 'ep_storedval_size': '8000000', 'ep_tmp_oom_errors': '0', 'ep_value_size': '6484469920', 'mem_used': '6542377632', 'tcmalloc.aggressive_memory_decommit': '1', 'tcmalloc_current_thread_cache_bytes': '26394344', 'tcmalloc_max_thread_cache_bytes': '33554432', 'total_allocated_bytes': '6556037464', 'total_fragmentation_bytes': '61148840', 'total_free_mapped_bytes': '0', 'total_free_unmapped_bytes': '650362880', 'total_heap_bytes': '7267549184'}

which, using the script's logic, gives us:

ratio = float(stats['mem_used']) / float(stats['total_heap_bytes'])

which is 6542377632.0/7267549184.0 = 0.9002178679992268 so about 10% fragmentation. This is lower than the Windows 3.1.0 of 14.9% and Linux Sherlock's 15.4% .

Logs:

In the attached files, this is the troubled run RC4 on Windows: "Sherlock RC4-collectinfo-2015-08-28T052801-ns_1@127.0.0.1.zip"

And this is the 3.1.0 on Windows: "3.1.0-172.23.100.180.zip"

While naming for "Linux-Sherlock-rc4-Ares-172.23.96.11.zip" is self evident.

Some history for the perf team internal:

The revAB test has been posting some results but its console had shown errors for many versions for a while now. I fixed this by setting pip's requirement of couchbase to be at 2.0.2 (this setting is still part of the Jenkin's step on Hermes). I also made Windows restore more intelligent in Jenkins, so the Sherlock run definitely went through the attempt of Windows restore. You can see some hint printed on the console at the beginning.