Uploaded image for project: 'C++ Couchbase Client'
  1. C++ Couchbase Client
  2. CXXCBC-388

Performance: Potential optimisations for lookup_in

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • 1.0.0
    • None
    • transactions
    • None

    Description

      Explanation:
      I'm profiling the C++ SDK on KV and transactions operations currently and seeing a few potential micro-optimisations that could be made. Take or leave these thoughts as you will I'll file them separately.

      The source for the test app is here - note that only run_kv_lookup_in_workload is called, and that this workload simulates one of the reads that transactions do under-the-hood. The profiler is MSVC 2022.

      "2% of TT" means 2% of the total CPU time as reported by the profiler. It's a sampling profiler using CPU clock not wallclock.

      Issues:

      Testing indicates that non-transactional KV lookup_in with the C++ SDK are substantially slower than non-transaction KV gets (testing on a c3.4xlarge node - 16 vcpu 30gb mem - against a localhost single-node Docker cluster):

      Executed 1000 batches with 100 KV GET operations in 9915ms (9915053us, 9s), average latency: 99ms
      Executed 1000 batches with 100 KV lookup_in operations in 15334ms (15334045us, 15s), average latency: 153ms
      

      There's a great deal of time spent on creating & destroying lookup_in_request and lookup_in_specs.

      16.66% of TT is spent creating lookup_in_specs (and in other test runs it's been more like 30%+) of which 12.25% of TT is spent in here, e.g. ultimately pushing to a std::vector:

          explicit lookup_in_specs(Operation... args)
          {
              push_back(args...);
          }
      

      I'm not certain, but I think a lot of subdoc::get commands are getting created and destroyed by this logic.
      8.54% TT spent on subdoc::command::~command. 6.93% TT from destroying the std::vector inside it.
      5.10% TT spent in lookup_in_request::encode_to.  Over half on the stable_sort.
      1.75% of TT is spent in ~lookup_in_request, of which 1.38% TT on destroying the std::vector.
      1.33% of TT is spent in ~lookup_in_specs.

      Suggestions:

      1. Given that lookup_in operations are fixed at a max of 16 ops, would we benefit from switching out the std::vector for an array? I did already try reserving 16 elements on the std::vector, but it didn't seem to make a difference.
      2. Looks like lookup_in_specs is repeatedly creating and destroying subdoc::command's; perhaps some of the copies can be removed?

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            avsej Sergey Avseyev
            graham.pople Graham Pople
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes