Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-939

Up to 25% drop in pillowfight throughput after upgrade to libcouchbase 2.9.0

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.9.0
    • Fix Version/s: 2.10.5
    • Component/s: None
    • Labels:
      None

      Attachments

        Issue Links

        For Gerrit Dashboard: CCBC-939
        # Subject Branch Project Status CR V

          Activity

          Hide
          sharath.sulochana Sharath Sulochana added a comment - - edited

          Matt Ingenthron  - 

          As we discussed . Here are the new numbers for lbc 2.10.4 version .

          Sl No lbc version  CB Build trace_on trace_off
          1 2.10.0  MH Beta 3.15 M ops/sec 3.9 M ops/sec
          2 2.10.4  MH Beta 3.3 M ops/sec 4.05 M ops/sec
          3 2.9.5  Alice 3.4 M ops/sec 3.8 M ops/sec 
          4 2.8.7 MH Beta          NA ~3.95 M ops/sec (default )

          Do you want me to run any pre-2.9 version for latest comparison  ? 

           

          Test Env details : 

          [cluster]
          mem_quota = 51200
          initial_nodes = 2
          num_buckets = 1
          Doc size = 512
          items = 20000000
          workers = 50
          doc_gen = json

          vCPU - 40

          RAM - 64 GB

           

          cbc-pillowfight command sample (post 2.9.0):

          cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec "couchbase://172.23.133.13/bucket-1?enable_tracing=true&ipv6=allow" --set-pct 20 --num-cycles 40000 --no-population 

          cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec "couchbase://172.23.133.13/bucket-1?enable_tracing=false&ipv6=allow" --set-pct 20 --num-cycles 40000 --no-population 

           

          cbc-pillowfight command for 2.8.7 :

          cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec couchbase://172.23.133.13/bucket-1?ipv6=allow --set-pct 20 --num-cycles 40000 --no-population

          Show
          sharath.sulochana Sharath Sulochana added a comment - - edited Matt Ingenthron   -  As we discussed . Here are the new numbers for lbc 2.10.4 version . Sl No lbc version   CB Build trace_on trace_off 1 2.10.0  MH Beta 3.15 M ops/sec 3.9 M ops/sec 2 2.10.4  MH Beta 3.3 M ops/sec 4.05 M ops/sec 3 2.9.5  Alice 3.4 M ops/sec 3.8 M ops/sec  4 2.8.7 MH Beta          NA ~3.95 M ops/sec (default ) Do you want me to run any pre-2.9 version for latest comparison  ?     Test Env details :   [cluster] mem_quota = 51200 initial_nodes = 2 num_buckets = 1 Doc size = 512 items = 20000000 workers = 50 doc_gen = json vCPU - 40 RAM - 64 GB   cbc-pillowfight command sample (post 2.9.0): cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec "couchbase://172.23.133.13/bucket-1? enable_tracing=true &ipv6=allow" --set-pct 20 --num-cycles 40000 --no-population  cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec "couchbase://172.23.133.13/bucket-1? enable_tracing=false &ipv6=allow" --set-pct 20 --num-cycles 40000 --no-population    cbc-pillowfight command for 2.8.7 : cbc-pillowfight --password password --batch-size 1000 --num-items 20000000 --num-threads 50 --min-size 512 --max-size 512 --persist-to 0 --replicate-to 0 --json --spec couchbase://172.23.133.13/bucket-1?ipv6=allow --set-pct 20 --num-cycles 40000 --no-population
          Hide
          avsej Sergey Avseyev added a comment -

          I've made couple of improvements here
          http://review.couchbase.org/114548
          http://review.couchbase.org/114549
          But still this is not enough to get target performance

          Right now the problem is that all values for tracing tags are dynamically allocated (while shortliving). I will try to use memory pool there, just like we use for the network buffers, so there still be copies, but at least we won't spend time in malloc/free all the time.

          Show
          avsej Sergey Avseyev added a comment - I've made couple of improvements here http://review.couchbase.org/114548 http://review.couchbase.org/114549 But still this is not enough to get target performance Right now the problem is that all values for tracing tags are dynamically allocated (while shortliving). I will try to use memory pool there, just like we use for the network buffers, so there still be copies, but at least we won't spend time in malloc/free all the time.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Since many of the tags are common, you can maybe do something like interning the strings as well. Thanks for the analysis Sergey Avseyev and thanks for the support in running perf tests Sharath Sulochana

          Show
          ingenthr Matt Ingenthron added a comment - Since many of the tags are common, you can maybe do something like interning the strings as well. Thanks for the analysis Sergey Avseyev and thanks for the support in running perf tests Sharath Sulochana
          Hide
          avsej Sergey Avseyev added a comment -

          With that cache, I avoid recalculation addresses every time, but still copy them into span so that I can be sure that the span will survive the sockets that owns the address info buffers.

          But on the other hand, the spans generated by the library itself, and will not cross the border of LCB-Application, so I think I can assume that I don't need to copy those addresses.

          I will try and get back with my findings

          Show
          avsej Sergey Avseyev added a comment - With that cache, I avoid recalculation addresses every time, but still copy them into span so that I can be sure that the span will survive the sockets that owns the address info buffers. But on the other hand, the spans generated by the library itself, and will not cross the border of LCB-Application, so I think I can assume that I don't need to copy those addresses. I will try and get back with my findings
          Hide
          avsej Sergey Avseyev added a comment -

          Okay, now I see comparable performance with and without tracing (after this patch http://review.couchbase.org/114551).

          Sharath Sulochana, would it be possible to rerun the perf tests with all three applied? 114551 is the tip of the patch queue.

          Show
          avsej Sergey Avseyev added a comment - Okay, now I see comparable performance with and without tracing (after this patch http://review.couchbase.org/114551 ). Sharath Sulochana , would it be possible to rerun the perf tests with all three applied? 114551 is the tip of the patch queue.

            People

            • Assignee:
              sharath.sulochana Sharath Sulochana
              Reporter:
              pavelpaulau Pavel Paulau (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                PagerDuty

                Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.