Uploaded image for project: 'Couchbase Python Client Library'
  1. Couchbase Python Client Library
  2. PYCBC-1037

SDK3(patch set) kv and query latency ~30% higher than SDK2 with external libcouchbase installed

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.1
    • None
    • None
    • 1
    • SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl, SDK2: FLEpt2, TxnsTest, SDK4: 3.1GA,Ruby3.1,TxnTst,FLE

    Description

      Comparing SDK3 to SDK2 kv get/set latency, we see that across all latency percentiles, SDK3 is ~20-30% higher than SDK2. Here are the two tests for comparison:

      SDK 2.5.0 - 24 workers:

      http://perf.jenkins.couchbase.com/job/ares/17051/

      SDK 3 - 24 workers:

      http://perf.jenkins.couchbase.com/job/ares/17049/ 

      In both tests, kv ops are against default scope/collection.

      SDK3 is this patch: git+http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3

      Here is a comparison graph:

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-3090_access_c015&label=sdk2(24workers)&snapshot=ares_700-3090_access_ae94&label=sdk3(24workers)

      The only difference in these tests, other than sdk version, cur_connections to bucket and ESTABILSHED connections to master node 172.23.133.13. Perhaps these "extra" connections are causing the slow down as the "extra" connections is around 30% more than sdk2 which corresponds to the increase in latency

      Workaround

      Remove any externally installed libcouchbase releases from the system. When run with the built-in libcouchbase, the higher latency is not observed.

      Attachments

        1. profile003.gif
          profile003.gif
          185 kB
        2. sdk2.kv.reads.repeated.perf.data
          31.04 MB
        3. sdk2.perf.data
          34.09 MB
        4. sdk3.kv.reads.repeated.3.perf.data
          36.92 MB
        5. sdk3.perf.data
          32.77 MB
        6. sdk3withlcb293.data
          425 kB
        7. sdk3withlcb306.data
          150 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          korrigan.clark Korrigan Clark (Inactive) created issue -
          ingenthr Matt Ingenthron made changes -
          Field Original Value New Value
          Assignee Ellis Breen [ ellis.breen ] Matt Ingenthron [ ingenthr ]

          Thanks Korrigan Clark. I had a look at the params in use and was looking into how to repro/extract a test case.

          Question: is this from the master branch of perfrunner, or is it one with some modifications?

          ingenthr Matt Ingenthron added a comment - Thanks Korrigan Clark . I had a look at the params in use and was looking into how to repro/extract a test case. Question: is this from the master branch of perfrunner, or is it one with some modifications?
          Ellis.Breen Ellis Breen made changes -
          Status New [ 10003 ] Open [ 1 ]

          Matt Ingenthron the test is run from perfrunner master branch, no modifications. The python sdk 3 is installed from the gerrit commit: git+http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3.

          You can probably see this issue from a simple script that loads some documents then does gets on the doc keys, taking the time right before the get call and right after to calculate the latency. Nothing fancy but you probably need a perf machine to see the .5 ms latency increase.

          I'll be running some additional experiments/grabbing profiles to try and narrow in one where the extra time spent in the sdk 3 is occuring.

          If theres anything specific you need from my side please let me know.

           

          korrigan.clark Korrigan Clark (Inactive) added a comment - Matt Ingenthron  the test is run from perfrunner master branch, no modifications. The python sdk 3 is installed from the gerrit commit: git+ http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3. You can probably see this issue from a simple script that loads some documents then does gets on the doc keys, taking the time right before the get call and right after to calculate the latency. Nothing fancy but you probably need a perf machine to see the .5 ms latency increase. I'll be running some additional experiments/grabbing profiles to try and narrow in one where the extra time spent in the sdk 3 is occuring. If theres anything specific you need from my side please let me know.  
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Attachment sdk2.kv.reads.repeated.perf.data [ 109076 ]
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Attachment sdk3.kv.reads.repeated.2.perf.data [ 109077 ]
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Attachment sdk3.kv.reads.repeated.2.perf.data [ 109077 ]
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Attachment sdk3.kv.reads.repeated.3.perf.data [ 109089 ]

          Did some profiling and found some interesting things. Two tests were run, one with PySDK 2.5.0 and on with PySDK patch set 3.0.5.dev3+g5a04cfd, where a single worker is doing gets on 100 keys in an infinite loop. The servers were loaded with 20M docs and compacted before the test. I have uploaded the profiles as: 

          sdk2.kv.reads.repeated.perf.data

          sdk3.kv.reads.repeated.3.perf.data

          you can view them with these commands:

          perf report -i sdk2.kv.reads.repeated.perf.data

          perf report -i sdk3.kv.reads.repeated.3.perf.data

          perf diff sdk3.kv.reads.repeated.3.perf.data sdk2.kv.reads.repeated.perf.data

          SDK3 is spending 88%+ time in _PyLong_FromNbInt and PyUnicode_DecodeUTF8Stateful, whereas in SDK2 most time was spent in epoll_wait, 97%+. Looks like the Python SDK3 is occupying more cpu cycles which are mostly decoding UTF8 strings. Possibly related to this file: https://github.com/couchbase/couchbase-python-client/blob/master/src/pycbc.h

          And this issue might be relevant: https://stackoverflow.com/questions/36098984/python-3-3-c-api-and-utf-8-strings 

          korrigan.clark Korrigan Clark (Inactive) added a comment - Did some profiling and found some interesting things. Two tests were run, one with PySDK 2.5.0 and on with PySDK patch set 3.0.5.dev3+g5a04cfd, where a single worker is doing gets on 100 keys in an infinite loop. The servers were loaded with 20M docs and compacted before the test. I have uploaded the profiles as:  sdk2.kv.reads.repeated.perf.data sdk3.kv.reads.repeated.3.perf.data you can view them with these commands: perf report -i sdk2.kv.reads.repeated.perf.data perf report -i sdk3.kv.reads.repeated.3.perf.data perf diff sdk3.kv.reads.repeated.3.perf.data sdk2.kv.reads.repeated.perf.data SDK3 is spending 88%+ time in _PyLong_FromNbInt and PyUnicode_DecodeUTF8Stateful, whereas in SDK2 most time was spent in epoll_wait, 97%+. Looks like the Python SDK3 is occupying more cpu cycles which are mostly decoding UTF8 strings. Possibly related to this file:  https://github.com/couchbase/couchbase-python-client/blob/master/src/pycbc.h And this issue might be relevant:  https://stackoverflow.com/questions/36098984/python-3-3-c-api-and-utf-8-strings  

          Excellent sleuthing Korrigan Clark! I'll definitely have a look at this in more detail. Just so you know, at the moment that probably means later next week.

          ingenthr Matt Ingenthron added a comment - Excellent sleuthing Korrigan Clark ! I'll definitely have a look at this in more detail. Just so you know, at the moment that probably means later next week.
          sharath.sulochana Sharath Sulochana (Inactive) made changes -
          Summary SDK3 kv get/set latency ~30% higher than SDK2 SDK3(patch set) kv get/set latency ~30% higher than SDK2

          Hi Matt Ingenthron-any ETA on this one?

          Thanks

          kamini.jagtiani Kamini Jagtiani added a comment - Hi Matt Ingenthron -any ETA on this one? Thanks

          Not at the moment Kamini Jagtiani. The primary engineer for this is OOO and we have a few other higher priority issues.

          If you want to assist, you might experiment with logging as David Kelly did check and IIRC there is no direct usage of PyUnicode_DecodeUTF8Stateful. It was thought that it might be indirect in logging. Lower the logging level (DEBUG isn't probably needed) and that may avoid this issue.

          ingenthr Matt Ingenthron added a comment - Not at the moment Kamini Jagtiani . The primary engineer for this is OOO and we have a few other higher priority issues. If you want to assist, you might experiment with logging as David Kelly did check and IIRC there is no direct usage of PyUnicode_DecodeUTF8Stateful. It was thought that it might be indirect in logging. Lower the logging level (DEBUG isn't probably needed) and that may avoid this issue.

          Thanks Matt Ingenthron.

          Korrigan Clark ,

          Can we try the tests with a lower logging level, as suggested by Matt?

          Thanks

           

          kamini.jagtiani Kamini Jagtiani added a comment - Thanks Matt Ingenthron . Korrigan Clark  , Can we try the tests with a lower logging level, as suggested by Matt? Thanks  

          Yes I will give it a try shortly Kamini Jagtiani Matt Ingenthron

          korrigan.clark Korrigan Clark (Inactive) added a comment - Yes I will give it a try shortly Kamini Jagtiani Matt Ingenthron

          Matt Ingenthron Kamini Jagtiani ran the tests with LCB_LOGLEVEL set to 0, which should log only in cases of fatal errors. The results show no difference. Here is graph comparing sdk3(with default log level) to sdk3(with log level 0): http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-3090_access_ae94&snapshot=ares_700-3090_access_80cf

          korrigan.clark Korrigan Clark (Inactive) added a comment - Matt Ingenthron Kamini Jagtiani  ran the tests with LCB_LOGLEVEL set to 0, which should log only in cases of fatal errors. The results show no difference. Here is graph comparing sdk3(with default log level) to sdk3(with log level 0):  http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-3090_access_ae94&snapshot=ares_700-3090_access_80cf

          Curious. We'll have a look at it as soon as possible.

          ingenthr Matt Ingenthron added a comment - Curious. We'll have a look at it as soon as possible.
          ingenthr Matt Ingenthron made changes -
          Summary SDK3(patch set) kv get/set latency ~30% higher than SDK2 SDK3(patch set) kv and query latency ~30% higher than SDK2
          kamini.jagtiani Kamini Jagtiani added a comment - - edited

          Korrigan Clark
          Can we try using REST API and see if you still see the performance difference?
          The reason I say this, when I was trying out transactions I noticed that
          -When I moved from buckets to collections, some of my joins are taking over 8 minutes instead of a few seconds.
          -It goes back to seconds if i specifically set "scan_consistency" to "not_bounded".If not specifically set, it assumes "request_plus" and takes much longer

          Raised MB-42655

          kamini.jagtiani Kamini Jagtiani added a comment - - edited Korrigan Clark Can we try using REST API and see if you still see the performance difference? The reason I say this, when I was trying out transactions I noticed that -When I moved from buckets to collections, some of my joins are taking over 8 minutes instead of a few seconds. -It goes back to seconds if i specifically set "scan_consistency" to "not_bounded".If not specifically set, it assumes "request_plus" and takes much longer Raised MB-42655

          Thats expected for scan consistency if there is load on the system. Perfrunner does not have rest query support right now and I currently do not have time to add it. However, it shouldnt be to hard to add. The performance here is only visible for very low latency ops. If the query is long running the impact will be negligible from sdk issue side of things. The sdk actually calls the rest api underneath. The reason we are seeing sdk slowness is due to decoding the returned results within the sdk before returning. With rest we would skip the decode phase so latency will look better. I dont think its of much value to run the rest api for this scenario but would be good to have support for future. Ill discuss with Wayne Siu.

          korrigan.clark Korrigan Clark (Inactive) added a comment - Thats expected for scan consistency if there is load on the system. Perfrunner does not have rest query support right now and I currently do not have time to add it. However, it shouldnt be to hard to add. The performance here is only visible for very low latency ops. If the query is long running the impact will be negligible from sdk issue side of things. The sdk actually calls the rest api underneath. The reason we are seeing sdk slowness is due to decoding the returned results within the sdk before returning. With rest we would skip the decode phase so latency will look better. I dont think its of much value to run the rest api for this scenario but would be good to have support for future. Ill discuss with Wayne Siu .

          Korrigan Clark I can repro it stand-alone - with no load on teh system.
          Take a look at MB-42655 for details

          kamini.jagtiani Kamini Jagtiani added a comment - Korrigan Clark I can repro it stand-alone - with no load on teh system. Take a look at MB-42655 for details
          ingenthr Matt Ingenthron made changes -
          Sprint SDK51: …, 2 [ 1308 ]
          ingenthr Matt Ingenthron made changes -
          Rank Ranked lower
          ingenthr Matt Ingenthron made changes -
          Sprint SDK51: …, 2 [ 1308 ] SDK48: FTS Score/Incl, Docs. [ 1307 ]
          ingenthr Matt Ingenthron made changes -
          Rank Ranked lower
          ingenthr Matt Ingenthron made changes -
          Assignee Matt Ingenthron [ ingenthr ] David Kelly [ david.kelly ]
          david.kelly David Kelly made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          david.kelly David Kelly made changes -
          Attachment profile003.gif [ 118168 ]
          david.kelly David Kelly added a comment -

          I'm looking at this a bit now.  My assumption is, if 88% of the time spent is in {{  _PyLong_FromNbInt and PyUnicode_DecodeUTF8Stateful}} vs epoll, I should be able to see that right away, using typical profiling tools like yep which uses google_perftools under the hood. So locally, I just have a script that uses yep to profile a loop of 200K get calls to 20K documents, figuring I'd see the . But I don't - running on my Mac, using pprof to output a call graph with times, I see So - not seeing the decode calls take much of the time.

          FWIW, I don't see how 88% of the time could be consumed by decoding strings and only suffer a small percentage change in latency, since the network latency should be the same. Perhaps I'm completely missing something here. I'll do more today, have questions, etc...

          david.kelly David Kelly added a comment - I'm looking at this a bit now.  My assumption is, if 88% of the time spent is in {{  _PyLong_FromNbInt and PyUnicode_DecodeUTF8Stateful}} vs  epoll , I should be able to see that right away, using typical profiling tools like yep which uses google_perftools under the hood. So locally, I just have a script that uses yep to profile a loop of 200K get calls to 20K documents, figuring I'd see the . But I don't - running on my Mac, using pprof to output a call graph with times, I see So - not seeing the decode calls take much of the time. FWIW, I don't see how 88% of the time could be consumed by decoding strings and only suffer a small percentage change in latency, since the network latency should be the same. Perhaps I'm completely missing something here. I'll do more today, have questions, etc...

          David Kelly I think what you info + my info could be implying is that OSX and Linux could be causing the sdk to behave differently with regards to py unicode decoding. Could you try on a Linux box and see if there is a difference?

          korrigan.clark Korrigan Clark (Inactive) added a comment - David Kelly  I think what you info + my info could be implying is that OSX and Linux could be causing the sdk to behave differently with regards to py unicode decoding. Could you try on a Linux box and see if there is a difference?
          david.kelly David Kelly added a comment -

          Korrigan Clark I agree – I'll fire up a linux box and check it out, and get some data on that Monday. Also - could be the python version - LMK what version you were using and I'll match that. When I installed the performance tools, brew updated me to python3.9 (which I didn't realize until later), and that could be something.

          david.kelly David Kelly added a comment - Korrigan Clark I agree – I'll fire up a linux box and check it out, and get some data on that Monday. Also - could be the python version - LMK what version you were using and I'll match that. When I installed the performance tools, brew updated me to python3.9 (which I didn't realize until later), and that could be something.

          David Kelly python 3.5.6, ubuntu 18

          korrigan.clark Korrigan Clark (Inactive) added a comment - David Kelly  python 3.5.6, ubuntu 18
          david.kelly David Kelly added a comment -

          Korrigan Clark - My ubuntu image came with python 3.6 (3.6.9), and so I used it for this first try. The call graph looks roughly same as above. I'm curious about 3.5.6 so will install that on my image later, try again.

          david.kelly David Kelly added a comment - Korrigan Clark - My ubuntu image came with python 3.6 (3.6.9), and so I used it for this first try. The call graph looks roughly same as above. I'm curious about 3.5.6 so will install that on my image later, try again.
          ingenthr Matt Ingenthron made changes -
          Sprint SDK48: FTS Score/Incl, Docs. [ 1307 ] SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl [ 1307, 1308 ]
          ingenthr Matt Ingenthron made changes -
          Sprint SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl [ 1307, 1308 ] SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl, SDK2: FLEpt2, TxnsTest [ 1307, 1308, 1309 ]
          sharath.sulochana Sharath Sulochana (Inactive) made changes -
          Labels performance affects-cc-testing performance
          sharath.sulochana Sharath Sulochana (Inactive) made changes -
          Priority Major [ 3 ] Test Blocker [ 6 ]
          david.kelly David Kelly added a comment -

          Korrigan Clark Sorry – completely forgot to update this. The profiling I've done shows pretty much exactly the same as the above, even on the same python version, etc... It would be good for me to understand how you got your profile info. Then at least, I should be able to see the same thing. That seems like a good place to start.

          Can you let me know what tools you use to profile? I'd like to use the same tools, as I can't see why we'd see such different things from the profiler.

          david.kelly David Kelly added a comment - Korrigan Clark Sorry – completely forgot to update this. The profiling I've done shows pretty much exactly the same as the above, even on the same python version, etc... It would be good for me to understand how you got your profile info. Then at least, I should be able to see the same thing. That seems like a good place to start. Can you let me know what tools you use to profile? I'd like to use the same tools, as I can't see why we'd see such different things from the profiler.

          Kamini Jagtiani, since you've observed this too, can you provide some details on where you've seen it? Do you have a profile to share?

          ingenthr Matt Ingenthron added a comment - Kamini Jagtiani , since you've observed this too, can you provide some details on where you've seen it? Do you have a profile to share?
          david.kelly David Kelly added a comment - - edited

          Korrigan Clark - I used 3.5.10 I believe, on ubuntu18. Sadly, that virtualbox instance won't come up for me to check that (not 3.5.6, but close), but in any case, I can't really reproduce the same sort of profile - spending the vast majority of the time decoding strings. I think it is worth pairing up – I'll ping you on slack, and perhaps we can figure out some time to get together and look at this.

          david.kelly David Kelly added a comment - - edited Korrigan Clark - I used 3.5.10 I believe, on ubuntu18. Sadly, that virtualbox instance won't come up for me to check that (not 3.5.6, but close), but in any case, I can't really reproduce the same sort of profile - spending the vast majority of the time decoding strings. I think it is worth pairing up – I'll ping you on slack, and perhaps we can figure out some time to get together and look at this.

          Korrigan Clark and David Kelly Are you able to connect with each other? What are the next steps here?

          raju Raju Suravarjjala added a comment - Korrigan Clark and David Kelly Are you able to connect with each other? What are the next steps here?

          Raju Suravarjjala

          Was working with David Kelly and found some potential issues with the SDK3 dependency on LCB. David is taking a deeper look now and I am running some experiments. We should have some more info in the morning to update with.

          korrigan.clark Korrigan Clark (Inactive) added a comment - Raju Suravarjjala Was working with David Kelly  and found some potential issues with the SDK3 dependency on LCB. David is taking a deeper look now and I am running some experiments. We should have some more info in the morning to update with.
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Attachment sdk2.perf.data [ 121859 ]
          Attachment sdk3.perf.data [ 121860 ]
          Attachment sdk3withlcb293.data [ 121861 ]
          Attachment sdk3withlcb306.data [ 121862 ]

          Update: it appears the performance for python sdk 3 is dependent on which preinstalled version of LCB is present prior to installing python sdk 3. We ran two tests, one where LCB 2.9.3 was installed prior to pysdk 3.0.6 and one where LCB 3.0.2 was installed prior to pysdk 3.0.6. Here are the graphs: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4078_access_d7de&snapshot=ares_700-4078_access_f0c2

          You will see that latency is higher for LCB 3.0.2 + pysdk3 3.0.6 (orange). Thats is also the only difference visible in the comparision graphs.

          This is odd because it was my understanding that the pysdk3 install should build its own isolated LCB and should not use any preinstalled LCB

          korrigan.clark Korrigan Clark (Inactive) added a comment - Update: it appears the performance for python sdk 3 is dependent on which preinstalled version of LCB is present prior to installing python sdk 3. We ran two tests, one where LCB 2.9.3 was installed prior to pysdk 3.0.6 and one where LCB 3.0.2 was installed prior to pysdk 3.0.6. Here are the graphs: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4078_access_d7de&snapshot=ares_700-4078_access_f0c2 You will see that latency is higher for LCB 3.0.2 + pysdk3 3.0.6 (orange). Thats is also the only difference visible in the comparision graphs. This is odd because it was my understanding that the pysdk3 install should build its own isolated LCB and should not use any preinstalled LCB
          ingenthr Matt Ingenthron made changes -
          Summary SDK3(patch set) kv and query latency ~30% higher than SDK2 SDK3(patch set) kv and query latency ~30% higher than SDK2 with external LCB installed
          ingenthr Matt Ingenthron made changes -
          Description Comparing SDK3 to SDK2 kv get/set latency, we see that across all latency percentiles, SDK3 is ~20-30% higher than SDK2. Here are the two tests for comparison:

          SDK 2.5.0 - 24 workers:

          [http://perf.jenkins.couchbase.com/job/ares/17051/]

          SDK 3 - 24 workers:

          [http://perf.jenkins.couchbase.com/job/ares/17049/

          In both tests, kv ops are against default scope/collection.

          SDK3 is this patch: git+http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3

          Here is a comparison graph:

          [http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-3090_access_c015&label=sdk2(24workers)&snapshot=ares_700-3090_access_ae94&label=sdk3(24workers)]

          The only difference in these tests, other than sdk version, cur_connections to bucket and ESTABILSHED connections to master node 172.23.133.13. Perhaps these "extra" connections are causing the slow down as the "extra" connections is around 30% more than sdk2 which corresponds to the increase in latency
          Comparing SDK3 to SDK2 kv get/set latency, we see that across all latency percentiles, SDK3 is ~20-30% higher than SDK2. Here are the two tests for comparison:

          SDK 2.5.0 - 24 workers:

          [http://perf.jenkins.couchbase.com/job/ares/17051/]

          SDK 3 - 24 workers:

          [http://perf.jenkins.couchbase.com/job/ares/17049/

          In both tests, kv ops are against default scope/collection.

          SDK3 is this patch: git+http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3

          Here is a comparison graph:

          [http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-3090_access_c015&label=sdk2(24workers)&snapshot=ares_700-3090_access_ae94&label=sdk3(24workers)]

          The only difference in these tests, other than sdk version, cur_connections to bucket and ESTABILSHED connections to master node 172.23.133.13. Perhaps these "extra" connections are causing the slow down as the "extra" connections is around 30% more than sdk2 which corresponds to the increase in latency

          h2. Workaround

          Remove any externally installed libcouchbase releases from the system. When run with the built-in libcouchbase, the higher latency is not observed.
          ingenthr Matt Ingenthron made changes -
          Summary SDK3(patch set) kv and query latency ~30% higher than SDK2 with external LCB installed SDK3(patch set) kv and query latency ~30% higher than SDK2 with external libcouchbase installed

          Korrigan ClarkAny update from our conversation yesterday?

          raju Raju Suravarjjala added a comment - Korrigan Clark Any update from our conversation yesterday?
          david.kelly David Kelly added a comment -

          This may also be due to install of libcouchbase3-libevent (or maybe even libcouchbase3-tools). I'm also going to look at runtime what is actually loaded - I sort of expect an issue with the eventing backend conflicting somehow, but we will see.

          david.kelly David Kelly added a comment - This may also be due to install of libcouchbase3-libevent (or maybe even libcouchbase3-tools). I'm also going to look at runtime what is actually loaded - I sort of expect an issue with the eventing backend conflicting somehow, but we will see.

          Raju Suravarjjala

          Yes, David Kelly is looking into the underlying issue which we believe is caused by the sdk loading incorrect libraries when libcouchbase is preinstalled before the sdk. We have a work around which I will implement early next. In earlier 3.0.x releases, LCB needed to be installed before the sdk, and we built some scripts to take of this before each perf test. This has changed and LCB no longer needs to be preinstalled in most recent releases. The client install scripts need to be updated to remove LCB before any test using python sdk 3. Once I have updated the script we should be able get clean perf runs. I will update again when the scripts have been updated and we have new numbers, early next week.

          korrigan.clark Korrigan Clark (Inactive) added a comment - Raju Suravarjjala Yes, David Kelly  is looking into the underlying issue which we believe is caused by the sdk loading incorrect libraries when libcouchbase is preinstalled before the sdk. We have a work around which I will implement early next. In earlier 3.0.x releases, LCB needed to be installed before the sdk, and we built some scripts to take of this before each perf test. This has changed and LCB no longer needs to be preinstalled in most recent releases. The client install scripts need to be updated to remove LCB before any test using python sdk 3. Once I have updated the script we should be able get clean perf runs. I will update again when the scripts have been updated and we have new numbers, early next week.
          ingenthr Matt Ingenthron made changes -
          Sprint SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl, SDK2: FLEpt2, TxnsTest [ 1307, 1308, 1309 ] SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl, SDK2: FLEpt2, TxnsTest, SDK4: 3.1GA,Ruby3.1,TxnTst,FLE [ 1307, 1308, 1309, 1405 ]

          Korrigan Clark Thanks. Let me know once you have the new numbers

          raju Raju Suravarjjala added a comment - Korrigan Clark Thanks. Let me know once you have the new numbers
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited

          Raju Suravarjjala David Kelly I ran kv and n1ql tests with the sdk 3.0.8 and no lcb preinstalled. Here are the results with green being old runs and orange being new runs:

          90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, default collection

          Old: 1.2

          New: 1.2

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cd44&snapshot=iris_700-4226_access_0cab

          90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1

          Old: 1.2

          New: 1.2

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_534e&snapshot=iris_700-4226_access_38dd

          90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1000

          Old: 1.2

          New: 1.2

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cf7e&snapshot=iris_700-4226_access_90e9

          Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, s=1 c=1

          Old: 127,774

          New: 129,814

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_a7b4&snapshot=iris_700-4226_access_a29f

          Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, default collection

          Old: 127,943

          New: 130,978

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_8313&snapshot=iris_700-4226_access_16f4

          99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, default scope/collection

          Old: Get - 0.85, Set - 0.98

          New: Get - 1.0, Set - 1.12

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_f0db&snapshot=ares_700-4226_access_e4da

          99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, s=1 c=1

          Old: Get - 0.96, Set - 1.18

          New: Get - 0.91, Set - 1.12

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_2b50&snapshot=ares_700-4226_access_d04e

           

          There are some things to note:

          1 - n1ql latency remains about the same as before. Default, 1 collection and 1000 collections all have same latency. The main difference in the n1ql test is the number of connection in the timewait state. This is significantly less with the new runs. N1ql also sees a mild bump in throughput. The ops and cpu utlization are also smoother and more consistent throughout the tests.

          2 - KV default collection latency seems to have increase, both get and set. KV 1 collection seems to have decreased latency, both get and set. We do not see any difference in the number of connection in timewait state in the kv tests.

           

          Comparing SDK2 to SDK3 default collection:

          There still is a difference (30%) between sdk2 and sdk3 default collection for latency tests. However the n1ql throughput numbers for sdk2 and sdk3 are now roughly the same. This could possible be the effect of the new libevent IO in server.

          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited Raju Suravarjjala David Kelly  I ran kv and n1ql tests with the sdk 3.0.8 and no lcb preinstalled. Here are the results with green being old runs and orange being new runs: 90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, default collection Old: 1.2 New: 1.2 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cd44&snapshot=iris_700-4226_access_0cab 90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1 Old: 1.2 New: 1.2 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_534e&snapshot=iris_700-4226_access_38dd 90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1000 Old: 1.2 New: 1.2 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cf7e&snapshot=iris_700-4226_access_90e9 Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, s=1 c=1 Old: 127,774 New: 129,814 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_a7b4&snapshot=iris_700-4226_access_a29f Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, default collection Old: 127,943 New: 130,978 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_8313&snapshot=iris_700-4226_access_16f4 99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, default scope/collection Old: Get - 0.85, Set - 0.98 New: Get - 1.0, Set - 1.12 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_f0db&snapshot=ares_700-4226_access_e4da 99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, s=1 c=1 Old: Get - 0.96, Set - 1.18 New: Get - 0.91, Set - 1.12 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_2b50&snapshot=ares_700-4226_access_d04e   There are some things to note: 1 - n1ql latency remains about the same as before. Default, 1 collection and 1000 collections all have same latency. The main difference in the n1ql test is the number of connection in the timewait state. This is significantly less with the new runs. N1ql also sees a mild bump in throughput. The ops and cpu utlization are also smoother and more consistent throughout the tests. 2 - KV default collection latency seems to have increase, both get and set. KV 1 collection seems to have decreased latency, both get and set. We do not see any difference in the number of connection in timewait state in the kv tests.   Comparing SDK2 to SDK3 default collection: There still is a difference (30%) between sdk2 and sdk3 default collection for latency tests. However the n1ql throughput numbers for sdk2 and sdk3 are now roughly the same. This could possible be the effect of the new libevent IO in server.
          david.kelly David Kelly made changes -
          Status In Progress [ 3 ] Open [ 1 ]

          Thanks Korrigan Clark Did you publish all the numbers into showfast? Any other bugs/issues you are tracking?

          raju Raju Suravarjjala added a comment - Thanks Korrigan Clark Did you publish all the numbers into showfast? Any other bugs/issues you are tracking?

          Raju Suravarjjala yes these results should be reflected on showfast now for latest build and going forward the n1ql and kv latency tests will all be clean with 3.0.8 sdk version. I am in the processes of doing some retuning for these tests as the performance has changed slightly so there may be some quirks for a test or two. I am in the processes analyzing the rest of the tests and will open a bug if I see any issues. The latency increase is still present but there appears to be increased throughput as well. So, this issue may come down to tradeoffs.

          korrigan.clark Korrigan Clark (Inactive) added a comment - Raju Suravarjjala  yes these results should be reflected on showfast now for latest build and going forward the n1ql and kv latency tests will all be clean with 3.0.8 sdk version. I am in the processes of doing some retuning for these tests as the performance has changed slightly so there may be some quirks for a test or two. I am in the processes analyzing the rest of the tests and will open a bug if I see any issues. The latency increase is still present but there appears to be increased throughput as well. So, this issue may come down to tradeoffs.

          Nice work Korrigan Clark and David Kelly.

          One thing I would note by looking at the results you linked to. Through 99%ile, the results are exactly the same. For gets/sets, at least as I read it, above 99%ile it's not always consistently one test that is lower latency between sets and gets. It's just a guess, but multiple runs might show things flipping in these upper %iles. Looking at the scatterplot seems to confirm that. If you were to update that chart to plot only the points beyond 1σ, we could probably see if it's noise or an actual difference. Then again, we'd expect higher latency as throughput increases and that we don't have that seems to mean we've made things better.

          ingenthr Matt Ingenthron added a comment - Nice work Korrigan Clark and David Kelly . One thing I would note by looking at the results you linked to. Through 99%ile, the results are exactly the same. For gets/sets, at least as I read it, above 99%ile it's not always consistently one test that is lower latency between sets and gets. It's just a guess, but multiple runs might show things flipping in these upper %iles. Looking at the scatterplot seems to confirm that. If you were to update that chart to plot only the points beyond 1σ, we could probably see if it's noise or an actual difference. Then again, we'd expect higher latency as throughput increases and that we don't have that seems to mean we've made things better.
          wayne Wayne Siu made changes -
          Labels affects-cc-testing performance performance
          ingenthr Matt Ingenthron made changes -
          Priority Test Blocker [ 6 ] Minor [ 4 ]
          ingenthr Matt Ingenthron made changes -
          Priority Minor [ 4 ] Major [ 3 ]

          Matt Ingenthron David Kelly

          From our side, it looks like this bug is resolved. If you guys have other thing in mind, let me know and we can keep it open, otherwise I think we should close. Will leave it up to David.

          korrigan.clark Korrigan Clark (Inactive) added a comment - Matt Ingenthron David Kelly From our side, it looks like this bug is resolved. If you guys have other thing in mind, let me know and we can keep it open, otherwise I think we should close. Will leave it up to David.
          david.kelly David Kelly made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]

          People

            david.kelly David Kelly
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty