Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37553

query throughput lower due to change in ns_server affecting indexer

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Yes

    Description

      Avg. Query Throughput (queries/sec), Q3 Array Indexing, Covering, Range Scan, Plasma, request_plus
      6.5.0-4960 - 740

      7.0.0:

      1165: http://perf.jenkins.couchbase.com/job/iris/17747/ - 479
      http://perf.jenkins.couchbase.com/job/iris/17836/ - 479.0

      1140: http://perf.jenkins.couchbase.com/job/iris/17840/ - 480.0
      1120: http://perf.jenkins.couchbase.com/job/iris/17839/ - 480.0
      1110: http://perf.jenkins.couchbase.com/job/iris/17842/ - 481.0
      1107: http://perf.jenkins.couchbase.com/job/iris/17844/ - 481.0
      1107-2: http://perf.jenkins.couchbase.com/job/iris/17846/ - 481.0 . <----

      1106-2: http://perf.jenkins.couchbase.com/job/iris/17847/ - 742.0 . <----
      1106: http://perf.jenkins.couchbase.com/job/iris/17845/ - 740.0
      1105: http://perf.jenkins.couchbase.com/job/iris/17843/ - 745.0
      1100: http://perf.jenkins.couchbase.com/job/iris/17838/ - 734.0

       

      We are seeing roughly 25% reduction in query throughput in build 7.0.0-1107 compared to 7.0.0-1106. You can see the comparison graphs here: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-1107_access_cab4&snapshot=iris_700-1106_access_249f

      Logs:

      1107 - 45 is indexer node, 55 is n1ql node

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.45.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.55.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.70.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.71.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.72.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17846/172.23.100.73.zip

      1106 - 45 is indexer node, 55 is n1ql node

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.45.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.55.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.70.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.71.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.72.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-iris-17847/172.23.100.73.zip

       

      Changelog:

      http://172.23.123.43:8000/getchangelog?product=couchbase-server&fromb=7.0.0-1106&tob=7.0.0-1107

      There are only 2 relevant changes, one comes from asterixdb and the other from ns_server. The ns_server change is only adding compatibility mode markers and relevant functions. It seems this compatibility mode value is not used anywhere by n1ql but is used by indexer. If you look in the logs ns_server.indexer.log for 1107 you will see some errors that are not in the 1106 logs. Other than that I do see some connection closed type messages that could be related.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          This issue seems to also be affecting the following:

          90th percentile query latency (ms), Q26, Hash Join (10K -> 1M), not_bounded

          1107: http://perf.jenkins.couchbase.com/job/arke/3140/ - 9784
          1106: http://perf.jenkins.couchbase.com/job/arke/3141/ - 7815

          Avg. Query Throughput (queries/sec), Q3 Array Indexing, Covering, Range Scan, Plasma, not_bounded

          Avg. Query Throughput (queries/sec), Q26, Hash Join (1K -> 100K), not_bounded

          korrigan.clark Korrigan Clark (Inactive) added a comment - This issue seems to also be affecting the following: 90th percentile query latency (ms), Q26, Hash Join (10K -> 1M), not_bounded 1107: http://perf.jenkins.couchbase.com/job/arke/3140/ - 9784 1106: http://perf.jenkins.couchbase.com/job/arke/3141/ - 7815 Avg. Query Throughput (queries/sec), Q3 Array Indexing, Covering, Range Scan, Plasma, not_bounded Avg. Query Throughput (queries/sec), Q26, Hash Join (1K -> 100K), not_bounded

          Korrigan Clark The indexer has hit into a known issue MB-37340 (session_consistency_strict mode). This bug was fixed in 6.5.0 branch but not in 7.0 yet. We just pushed that fix to CC today, will update with build number that has this fix.

          However, one thing that shows up as a stark difference between indexer logs from 1106 and 1107 is the latency for getting dcp seqnos from KV seems to have increased between both builds. Indexer tracks this as timing stat and its values from both builds are:

          1106:
          "bucket-1:timings/dcp_getseqs":"547920 267214769972 916653808804509274"
          Average duration of request = 267214769972 / 547920 = 0.48ms

          1107:
          "bucket-1:timings/dcp_getseqs":"40698 743199310787 7041177665539212293"
          Average duration of request = 743199310787 / 40698 = 18ms

          We can re-run this test with build 7.0.0-1194 which has fix for MB-37340 in CC and then recheck the dcp_getseqs latency stat.

          prathibha Prathibha Bisarahalli (Inactive) added a comment - - edited Korrigan Clark The indexer has hit into a known issue MB-37340 (session_consistency_strict mode). This bug was fixed in 6.5.0 branch but not in 7.0 yet. We just pushed that fix to CC today, will update with build number that has this fix. However, one thing that shows up as a stark difference between indexer logs from 1106 and 1107 is the latency for getting dcp seqnos from KV seems to have increased between both builds. Indexer tracks this as timing stat and its values from both builds are: 1106: "bucket-1:timings/dcp_getseqs":"547920 267214769972 916653808804509274" Average duration of request = 267214769972 / 547920 = 0.48ms 1107: "bucket-1:timings/dcp_getseqs":"40698 743199310787 7041177665539212293" Average duration of request = 743199310787 / 40698 = 18ms We can re-run this test with build 7.0.0-1194 which has fix for MB-37340 in CC and then recheck the dcp_getseqs latency stat.

          20:45:14 2020-01-26T20:45:14 [INFO] Adding a benchmark: {
          20:45:14     "build": "7.0.0-1210",
          20:45:14     "buildURL": "http://perf.jenkins.couchbase.com/job/iris/17997/",
          20:45:14     "dateTime": "2020-01-26 20:45",
          20:45:14     "id": "f1bc66c4aea84230a595b224e2afee86",
          20:45:14     "metric": "n1ql_thr_array_indexing_cov_Q3_20M_between_gsi_plasma_false_avg_query_requests_iris",
          20:45:14     "snapshots": [
          20:45:14         "iris_700-1210_access_6f5c"
          20:45:14     ],
          20:45:14     "value": 1140.0
          20:45:14 }
          

          Sitaram.Vemulapalli Sitaram Vemulapalli added a comment - 20 : 45 : 14 2020 - 01 -26T20: 45 : 14 [INFO] Adding a benchmark: { 20 : 45 : 14 "build" : "7.0.0-1210" , 20 : 45 : 14 "buildURL" : "http://perf.jenkins.couchbase.com/job/iris/17997/" , 20 : 45 : 14 "dateTime" : "2020-01-26 20:45" , 20 : 45 : 14 "id" : "f1bc66c4aea84230a595b224e2afee86" , 20 : 45 : 14 "metric" : "n1ql_thr_array_indexing_cov_Q3_20M_between_gsi_plasma_false_avg_query_requests_iris" , 20 : 45 : 14 "snapshots" : [ 20 : 45 : 14 "iris_700-1210_access_6f5c" 20 : 45 : 14 ], 20 : 45 : 14 "value" : 1140.0 20 : 45 : 14 }

          In latest build:

          "bucket-1:timings/dcp_getseqs":"300380 587690867648"

          looks better now.

          korrigan.clark Korrigan Clark (Inactive) added a comment - In latest build: "bucket-1:timings/dcp_getseqs":"300380 587690867648" looks better now.

          People

            korrigan.clark Korrigan Clark (Inactive)
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty