Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34631

Low throughput for new durability options

    XMLWordPrintable

Details

    Description

      I have run a bunch of tests for the new durability options and there are three things I notice...

      These tests were run with 4 node CB cluster, 4 client machines with 25 threads on each client machine executing YCSB workload A, and SDK 3.0 with new durability options.

      https://docs.google.com/spreadsheets/d/1B8v4OZneOeGxJwUj226zA3YDr0Y0gjRSVLwy0IAP9qw/edit?usp=sharing

      1: All new durability levels have exactly the same throughput.

      2: All new durability levels severely underperform the analogous old durability options.

      3: The performance is not impacted by the SDK.

      I have been working with Michael Nitschinger to eliminate the possibility that the SDK is causing performance issues. It seems like we can rule out the SDK being the cause.  From the observations it looks like the new durability options have a different code path than the old durability options and that this new pathway has a single threaded bottleneck somewhere.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            owend Daniel Owen added a comment -

            Assigning to Korrigan Clark, as I think the next step is to investigate using different SDKs to measure the performance of the durability levels.

            owend Daniel Owen added a comment - Assigning to Korrigan Clark , as I think the next step is to investigate using different SDKs to measure the performance of the durability levels.
            wayne Wayne Siu added a comment -

            Daniel Owen Michael Nitschinger

            Based on the comments, it seems the team is suggesting that the next step we should continue to look at the SDK side.  I'm not sure we should move to a different SDKs (now) that we have a very reproducible case (unless the team believes the test was invalid). Michael Nitschinger can you work with Korry to see what else we could try or collect from the existing tests?  Thanks.

            wayne Wayne Siu added a comment - Daniel Owen Michael Nitschinger Based on the comments, it seems the team is suggesting that the next step we should continue to look at the SDK side.  I'm not sure we should move to a different SDKs (now) that we have a very reproducible case (unless the team believes the test was invalid). Michael Nitschinger can you work with Korry to see what else we could try or collect from the existing tests?  Thanks.
            owend Daniel Owen added a comment -

            FYI we found a performance issue due to logging code in ns_server - see MB-34690 and patch

            owend Daniel Owen added a comment - FYI we found a performance issue due to logging code in ns_server - see MB-34690 and patch

            That's great info Daniel Owen - thanks! Might be worth re-benchmarking then with YCSB once those changes are merged Korrigan Clark

            daschl Michael Nitschinger added a comment - That's great info Daniel Owen - thanks! Might be worth re-benchmarking then with YCSB once those changes are merged Korrigan Clark

            Michael Nitschinger reran with new build with fix in it and looks like sdk3 is behaving as it should now

            korrigan.clark Korrigan Clark added a comment - Michael Nitschinger reran with new build with fix in it and looks like sdk3 is behaving as it should now

            People

              daschl Michael Nitschinger
              korrigan.clark Korrigan Clark
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty