Loading...

XML

Word

Printable

Details

Type: Technical task
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.5.0
Affects Version/s: 6.5.0
Component/s: couchbase-bucket
Labels:
None

Sprint:
KV-Engine Mad-Hatter GA

Description

platform.so has an implementation of the 64bit byteswap functions ntohll() and htonll(), for platforms which don't have that symbol natively.

Profiling of ep-engine Writer threads highlighed that a large amount of time (~5%) was being spent in platform's ntohll() / htonll() functions. This was surprising, as:

I had assumed that modern Linux (CentOS 7) provided the 64bit byteswap functions, and

Even if the OS doesn't have those functions, I assumed our implementation shouldn't be that slow.

(For context the top 10 functions in the profile are below, ntohll is the 3rd hottest):

        Overhead  Command      Shared Object            Symbol

           4.82%  mc:writer_2  libsnappy.so.1.2.0       [.] snappy::internal::CompressFragment

           4.33%  mc:writer_2  [kernel.kallsyms]        [k] _raw_spin_lock_irq

           4.30%  mc:writer_2  libplatform_so.so.0.1.0  [.] ntohll

           2.82%  mc:writer_2  libc-2.17.so             [.] __memcpy_ssse3

           2.49%  mc:writer_2  libsnappy.so.1.2.0       [.] snappy::RawUncompress

           2.36%  mc:writer_2  [kernel.kallsyms]        [k] _raw_spin_lock

           1.99%  mc:writer_2  libjemalloc.so.2         [.] je_malloc_usable_size

           1.74%  mc:writer_2  [kernel.kallsyms]        [k] __radix_tree_lookup

           1.43%  mc:writer_2  libjemalloc.so.2         [.] je_malloc

Both my assumptions are actually incorrect:

CentOS 7 (and other recent distros including Ubuntu 18.04) don't have ntohll / htonll symbols. They do have functionally equivilent function htobe64() since glibc 2.9 (2008), but that's a different, Linux-specific symbol.

Our implemenation is slow - it's doing old-style manual byteswap, which is 10x slower on mancouch:

        Run on (24 X 2400 MHz CPU s)

        2019-11-06 12:20:30

        -----------------------------------------------------

        Benchmark              Time           CPU Iterations

        -----------------------------------------------------

        Swap64                57 ns         57 ns   12216131

        BuiltinSwap64          5 ns          5 ns  141279127

Given we already have an optimized byteswap implementation available from Folly, use that instead. Also inline the functions to reduce the call overhead.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Rigby (Inactive)

Reporter:: Dave Rigby (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 06/Nov/19 6:49 AM

Updated:: 13/Nov/19 6:37 AM

Resolved:: 13/Nov/19 6:37 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-36776: Replace inefficient impl of htonll/ntohll with folly::Endian: Gerrit Review:

Merge remote-tracking branch 'couchbase/mad-hatter': Gerrit Review:

Optimize use of byteswap functions in Writer thread

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty