Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-1673

Drop in xattr query access throughput for 100k users

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.0
    • 3.0
    • SyncGateway
    • Security Level: Public
    • CBG Sprint 82, CBG Sprint 83, CBG Sprint 84, CBG Sprint 85, CBG Sprint 86
    • 5

    Attachments

      Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          korrigan.clark Korrigan Clark (Inactive) created issue -

          running to isolate exact build now

          korrigan.clark Korrigan Clark (Inactive) added a comment - running to isolate exact build now

          3.0.0-345: http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9339/ - 2898.0

          Looks like issue comes in build 346

          korrigan.clark Korrigan Clark (Inactive) added a comment - 3.0.0-345: http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9339/  - 2898.0 Looks like issue comes in build 346
          ben.brooks Ben Brooks added a comment - - edited Thanks for the info Korrigan Clark - 346 was the build that upgraded gocb v1 to gocb v2, so provides an avenue for investigation http://changelog.build.couchbase.com/?product=sync_gateway&fromVersion=3.0.0&fromBuild=345&toVersion=3.0.0&toBuild=346&f_build=off&f_sync_gateway=on&f_gocb=on&f_gocbcore=on&f_sg-bucket=on
          adamf Adam Fraser made changes -
          Field Original Value New Value
          Story Points 1
          adamf Adam Fraser made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          adamf Adam Fraser made changes -
          Story Points 5
          ben.brooks Ben Brooks made changes -
          Rank Ranked lower
          ben.brooks Ben Brooks made changes -
          Rank Ranked higher
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 82 [ 1777 ]
          adamf Adam Fraser made changes -
          Rank Ranked lower
          adamf Adam Fraser made changes -
          Assignee The One [ the one ] Isaac Lambat [ JIRAUSER25602 ]
          isaac.lambat Isaac Lambat added a comment - - edited

          Hi Korrigan Clark,
          Do you have the Sync Gateway logs for this? We will need them to investigate.

          isaac.lambat Isaac Lambat added a comment - - edited Hi Korrigan Clark , Do you have the Sync Gateway logs for this? We will need them to investigate.
          korrigan.clark Korrigan Clark (Inactive) added a comment - http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9404/artifact/   logs at the bottom

          actually looks like they are empty... how do i go about grabbing sync gateway logs? looks like perfrunner is not doing it as it should?

          korrigan.clark Korrigan Clark (Inactive) added a comment - actually looks like they are empty... how do i go about grabbing sync gateway logs? looks like perfrunner is not doing it as it should?
          isaac.lambat Isaac Lambat added a comment - - edited

          You can either use the SG Collect application though command line: ./sgcollect_info /tmp/sgcollect_info.zip or the SG Collect endpoint: POST /_sgcollect_info with the output_dir as JSON in the body.

          isaac.lambat Isaac Lambat added a comment - - edited You can either use the SG Collect application though command line : ./sgcollect_info /tmp/sgcollect_info.zip or the SG Collect endpoint : POST /_sgcollect_info with the output_dir as JSON in the body.

          Isaac Lambat when i collect the logs i get the same empty folder... what could be causing this?

          korrigan.clark Korrigan Clark (Inactive) added a comment - Isaac Lambat  when i collect the logs i get the same empty folder... what could be causing this?
          isaac.lambat Isaac Lambat added a comment -

          Ideally, I'd recommend running SG Collect while Sync Gateway is running either, via the CLI or through the REST endpoint.

          If it is not possible to run SGCollect while SGW is running, then grabbing the logs directly from /var/tmp/sglogs (specified in your config by logging.log_file_path) should be enough for us to track down the issue. SG Collect would be attempting the grab the logs from the default location /home/sync_gateway/logs which could explain the empty folder.

          SGCollect is most likely to work if Sync Gateway is running, so try that method first if you can.

          isaac.lambat Isaac Lambat added a comment - Ideally, I'd recommend running SG Collect while Sync Gateway is running either, via the CLI or through the REST endpoint . If it is not possible to run SGCollect while SGW is running, then grabbing the logs directly from /var/tmp/sglogs (specified in your config by logging.log_file_path ) should be enough for us to track down the issue. SG Collect would be attempting the grab the logs from the default location /home/sync_gateway/logs which could explain the empty folder. SGCollect is most likely to work if Sync Gateway is running, so try that method first if you can.
          isaac.lambat Isaac Lambat added a comment - - edited

          Hi Korrigan Clark,

          Would you also be able to comment the link to repository and file where the tests are set up?

          Thanks

          isaac.lambat Isaac Lambat added a comment - - edited Hi Korrigan Clark , Would you also be able to comment the link to repository and file where the tests are set up? Thanks
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited https://github.com/couchbase/perfrunner/tree/syncgateway https://github.com/couchbase/perfrunner/blob/syncgateway/tests/syncgateway/syncgateway_querymix_throughput_1node.test https://github.com/couchbaselabs/YCSB/tree/syncgateway-weekly https://github.com/couchbaselabs/YCSB/blob/syncgateway-weekly/syncgateway/src/main/java/com/yahoo/ycsb/db/syncgateway/SyncGatewayClient.java Thats the base repo, but I have a large patch that it runs with:  http://review.couchbase.org/c/perfrunner/+/156652
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited Isaac Lambat  s3 link for logs:  https://perf-artifacts.s3.us-west-2.amazonaws.com/sglogs.zip
          isaac.lambat Isaac Lambat made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          isaac.lambat Isaac Lambat added a comment - - edited

          Hi Korrigan Clark,

          A fix has been merged in (CBG-1697) for build 407 so please could you rerun the perf tests against this build to see if the issue is fixed.

          Thanks

          isaac.lambat Isaac Lambat added a comment - - edited Hi Korrigan Clark , A fix has been merged in ( CBG-1697 ) for build 407 so please could you rerun the perf tests against this build to see if the issue is fixed. Thanks
          korrigan.clark Korrigan Clark (Inactive) added a comment - 407 job running:  http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9407/
          isaac.lambat Isaac Lambat made changes -
          Comment [ [~korrigan.clark] That perf test has unfortunately failed. Would you be able to get the logs so I can check if it's a Sync Gateway issue? ]
          isaac.lambat Isaac Lambat added a comment -

          Korrigan Clark My apologies, seems like build 407 failed to build some of the executables (including the x86_64 RPM) which meant the perf test failed to download SGW. Could you please rerun the perf test with build 408 which was successfully built?

          isaac.lambat Isaac Lambat added a comment - Korrigan Clark My apologies, seems like build 407 failed to build some of the executables (including the x86_64 RPM) which meant the perf test failed to download SGW. Could you please rerun the perf test with build 408 which was successfully built?
          adamf Adam Fraser made changes -
          Assignee Isaac Lambat [ JIRAUSER25602 ] Korrigan Clark [ korrigan.clark ]
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 82 [ 1777 ] CBG Sprint 82, CBG Sprint 83 [ 1777, 1801 ]
          korrigan.clark Korrigan Clark (Inactive) added a comment - 408 job and logs: http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9408/ https://perf-artifacts.s3.us-west-2.amazonaws.com/sglogs408.zip
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Assignee Korrigan Clark [ korrigan.clark ] Isaac Lambat [ JIRAUSER25602 ]

          Unfortunately the fix for the suspected issue did not solve the problem. We think that this problem is related to CBG-1705 so I will let you know when the perf test can be reran and with which build.

          isaac.lambat Isaac Lambat added a comment - Unfortunately the fix for the suspected issue did not solve the problem. We think that this problem is related to CBG-1705 so I will let you know when the perf test can be reran and with which build.
          isaac.lambat Isaac Lambat made changes -
          Link This issue is caused by CBG-1705 [ CBG-1705 ]
          isaac.lambat Isaac Lambat made changes -
          Required Mobile Fields Mandatory:
           - CBL / SG Version:
             - SG Config:
           - Steps to Reproduce:
           - Actual Result:
           - Expected Result:
           - Logs :
                SGW LOGS: sgcollect info
                CBL LOGS:
                Logcat LOGS: for Android tickets
           - Github link for the code:
           - Jenkins job failure link:
           - Pytest Command
           - What is the last build this test passed:
          daniel.petersen Daniel Petersen made changes -
          Component/s SyncGateway [ 14613 ]
          isaac.lambat Isaac Lambat made changes -
          Rank Ranked higher
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 82, CBG Sprint 83 [ 1777, 1801 ] CBG Sprint 82, CBG Sprint 83, CBG Sprint 84 [ 1777, 1801, 1822 ]
          isaac.lambat Isaac Lambat added a comment -

          Hi Korrigan Clark,

          The issue has been addressed in CBG-1705 which was merged in build 436. Please could you retest with either this build or the latest.

          Thanks

          isaac.lambat Isaac Lambat added a comment - Hi Korrigan Clark , The issue has been addressed in CBG-1705 which was merged in build 436. Please could you retest with either this build or the latest. Thanks
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited queued with 436:  http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9536/ -  488.0
          isaac.lambat Isaac Lambat made changes -
          Rank Ranked higher
          isaac.lambat Isaac Lambat added a comment -

          Hi Korrigan Clark,

          Looks like the issue is still present, so please could you post the Sync Gateway logs for this test so I can review the errors and warnings.

          Could you also run ulimit -n on the machine and post the output of that.

          Lastly, could you rerun the test with the latest build but with the database level config option max_concurrent_query_ops set to 250 and post the Sync Gateway logs for that.

          isaac.lambat Isaac Lambat added a comment - Hi Korrigan Clark , Looks like the issue is still present, so please could you post the Sync Gateway logs for this test so I can review the errors and warnings. Could you also run ulimit -n on the machine and post the output of that. Lastly, could you rerun the test with the latest build but with the database level config option max_concurrent_query_ops set to 250 and post the Sync Gateway logs for that.
          isaac.lambat Isaac Lambat added a comment -

          Hey, any update on this?

          isaac.lambat Isaac Lambat added a comment - Hey, any update on this?
          isaac.lambat Isaac Lambat made changes -
          Assignee Isaac Lambat [ JIRAUSER25602 ] Korrigan Clark [ korrigan.clark ]
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited

          Isaac Lambat could you provide the config file? I cannot find a parameter called max_concurrent_query_ops in the docs: https://docs.couchbase.com/sync-gateway/3.0/configuration-properties.html.

          This is the current config for the test:

          {
          "disable_persistent_config": true,
          "server_tls_skip_verify": true,
          "use_tls_server": false,
          "admin_interface_authentication": false,
          "metrics_interface_authentication": false,
          "adminInterface": "0.0.0.0:4985",
          "logging":

          { "log_file_path": "/var/tmp/sglogs" }

          ,
          "databases": {
          "db": {
          "server": "couchbase://172.23.100.190",
          "bucket": "bucket-1",
          "username": "bucket-1",
          "password": "password",
          "enable_shared_bucket_access": true,
          "users": { "GUEST":

          { "disabled": true, "admin_channels": ["*"] }

          },
          "cache" :

          { "channel_cache_max_length": 1, "channel_cache_min_length": 1 }

          ,
          "sync" : ` function (doc) { channel(doc.channels); access(doc.accessTo, doc.access); } `
          }
          }
          }

          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited Isaac Lambat  could you provide the config file? I cannot find a parameter called max_concurrent_query_ops in the docs:  https://docs.couchbase.com/sync-gateway/3.0/configuration-properties.html . This is the current config for the test: { "disable_persistent_config": true, "server_tls_skip_verify": true, "use_tls_server": false, "admin_interface_authentication": false, "metrics_interface_authentication": false, "adminInterface": "0.0.0.0:4985", "logging": { "log_file_path": "/var/tmp/sglogs" } , "databases": { "db": { "server": "couchbase://172.23.100.190", "bucket": "bucket-1", "username": "bucket-1", "password": "password", "enable_shared_bucket_access": true, "users": { "GUEST": { "disabled": true, "admin_channels": ["*"] } }, "cache" : { "channel_cache_max_length": 1, "channel_cache_min_length": 1 } , "sync" : ` function (doc) { channel(doc.channels); access(doc.accessTo, doc.access); } ` } } }

          On the sgw node

          [root@172-23-100-204 ~]# ulimit -n

          32768

          korrigan.clark Korrigan Clark (Inactive) added a comment - On the sgw node [root@172-23-100-204 ~] # ulimit -n 32768
          isaac.lambat Isaac Lambat added a comment -

          Hi, the "max_concurrent_query_ops" goes under the "db": { section. The full config is at the bottom of this comment (stuck the config option in to the one you provided). This config option is not documented yet as it is fairly new, so will get that documented.

          Also would it be possible to get the logs of the last perf test done on build 436 to have a look at the warnings and errors?

          Thanks

          The modified config would be:
          {
          "disable_persistent_config": true,
          "server_tls_skip_verify": true,
          "use_tls_server": false,
          "admin_interface_authentication": false,
          "metrics_interface_authentication": false,
          "adminInterface": "0.0.0.0:4985",
          "logging":

          { "log_file_path": "/var/tmp/sglogs" }

          ,
          "databases": {
          "db": {
          "max_concurrent_query_ops":250,
          "server": "couchbase://172.23.100.190",
          "bucket": "bucket-1",
          "username": "bucket-1",
          "password": "password",
          "enable_shared_bucket_access": true,
          "users": { "GUEST":

          { "disabled": true, "admin_channels": ["*"] }

          },
          "cache" :

          { "channel_cache_max_length": 1, "channel_cache_min_length": 1 }

          ,
          "sync" : ` function (doc)

          { channel(doc.channels); access(doc.accessTo, doc.access); }

          `
          }
          }
          }

          isaac.lambat Isaac Lambat added a comment - Hi, the "max_concurrent_query_ops" goes under the "db": { section. The full config is at the bottom of this comment (stuck the config option in to the one you provided). This config option is not documented yet as it is fairly new, so will get that documented. Also would it be possible to get the logs of the last perf test done on build 436 to have a look at the warnings and errors? Thanks The modified config would be: { "disable_persistent_config": true, "server_tls_skip_verify": true, "use_tls_server": false, "admin_interface_authentication": false, "metrics_interface_authentication": false, "adminInterface": "0.0.0.0:4985", "logging": { "log_file_path": "/var/tmp/sglogs" } , "databases": { "db": { "max_concurrent_query_ops":250, "server": "couchbase://172.23.100.190", "bucket": "bucket-1", "username": "bucket-1", "password": "password", "enable_shared_bucket_access": true, "users": { "GUEST": { "disabled": true, "admin_channels": ["*"] } }, "cache" : { "channel_cache_max_length": 1, "channel_cache_min_length": 1 } , "sync" : ` function (doc) { channel(doc.channels); access(doc.accessTo, doc.access); } ` } } }
          korrigan.clark Korrigan Clark (Inactive) added a comment - - edited queued 436 log grab - http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9657/   queued 447 with max query con set to 250 - http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9658/  

          Isaac Lambat not sure why the logs are empty still since I put in a fix and other tests logs get grabbed.  Will figure this out today. The query con 250 run however showed the same degraded performance - 491.0

          korrigan.clark Korrigan Clark (Inactive) added a comment - Isaac Lambat  not sure why the logs are empty still since I put in a fix and other tests logs get grabbed.  Will figure this out today. The query con 250 run however showed the same degraded performance - 491.0
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 82, CBG Sprint 83, CBG Sprint 84 [ 1777, 1801, 1822 ] CBG Sprint 82, CBG Sprint 83, CBG Sprint 84, CBG Sprint 85 [ 1777, 1801, 1822, 1845 ]
          korrigan.clark Korrigan Clark (Inactive) added a comment - fresh run (and on queued) with log collection working now: http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9768/ http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9769/  
          ben.brooks Ben Brooks made changes -
          Assignee Korrigan Clark [ korrigan.clark ] Isaac Lambat [ JIRAUSER25602 ]
          ben.brooks Ben Brooks added a comment -

          Reran 9769 with a newer build to pick up the change:

          http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9770/ 

          ben.brooks Ben Brooks added a comment - Reran 9769 with a newer build to pick up the change: http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9770/ 
          isaac.lambat Isaac Lambat made changes -
          Link This issue is triggering CBG-1765 [ CBG-1765 ]
          ben.brooks Ben Brooks added a comment -

          Reran again with the gocbv2 connstr fixes from CBG-1765:

          http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9861/

          ben.brooks Ben Brooks added a comment - Reran again with the gocbv2 connstr fixes from CBG-1765 : http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9861/
          isaac.lambat Isaac Lambat added a comment -

          Hi Korrigan Clark,

          The degraded performance problem seems to have been fixed by CBG-1765 however this has only been tested with a query limit of 250.

          Could you please run a perf test with the default query limit on the latest build?

          Thanks

          isaac.lambat Isaac Lambat added a comment - Hi Korrigan Clark , The degraded performance problem seems to have been fixed by CBG-1765 however this has only been tested with a query limit of 250. Could you please run a perf test with the default query limit on the latest build? Thanks
          ben.brooks Ben Brooks added a comment - Kicked off http://perf.jenkins.couchbase.com/job/syncgteway-hebe-new/9862/
          isaac.lambat Isaac Lambat made changes -
          Link This issue is caused by CBG-1765 [ CBG-1765 ]
          isaac.lambat Isaac Lambat added a comment -

          Success!! With a query limit of 1000 (the default) and with CBG-1765 merged in, the error count is at 0 throughout the entire perf test.

          Assigning back to you Korrigan Clark to verify

          isaac.lambat Isaac Lambat added a comment - Success!! With a query limit of 1000 (the default) and with CBG-1765 merged in, the error count is at 0 throughout the entire perf test. Assigning back to you Korrigan Clark to verify
          isaac.lambat Isaac Lambat made changes -
          Assignee Isaac Lambat [ JIRAUSER25602 ] Korrigan Clark [ korrigan.clark ]
          ben.brooks Ben Brooks added a comment -

          Isaac Lambat Is the throughput back up to previous values, or have we just checked the errors?

          ben.brooks Ben Brooks added a comment - Isaac Lambat Is the throughput back up to previous values, or have we just checked the errors?
          isaac.lambat Isaac Lambat added a comment -

          Yes, the throughput exceeds the previous values throughput. The throughput is 3029 req/sec compared to the old throughput of 2890 req/sec and the degraded performance throughput of 566 req/sec.

          isaac.lambat Isaac Lambat added a comment - Yes, the throughput exceeds the previous values throughput. The throughput is 3029 req/sec compared to the old throughput of 2890 req/sec and the degraded performance throughput of 566 req/sec .
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 82, CBG Sprint 83, CBG Sprint 84, CBG Sprint 85 [ 1777, 1801, 1822, 1845 ] CBG Sprint 82, CBG Sprint 83, CBG Sprint 84, CBG Sprint 85, CBG Sprint 86 [ 1777, 1801, 1822, 1845, 1868 ]
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          korrigan.clark Korrigan Clark (Inactive) made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            korrigan.clark Korrigan Clark (Inactive)
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty