Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-16957

Reduce scan latency for stale=false after bucket flush

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 4.0.0, 4.1.0
    • 5.0.0
    • secondary-index
    • Security Level: Public

    Description

      After bucket flush, the first stale=false scan will take 2 minute (timeout) before gsiClient to retry the operation. For developer experience and usability, this should be improved.

      Attachments

        1. build.5008.output.txt
          62 kB
        2. indexer.log
          1.52 MB
        3. mrprint2.py
          2 kB
        4. n1ql-query-index-hangs.zip
          4 kB
        5. output.txt
          65 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            With 4.1.0 Build 5008 I can reproduce this issue with the attached script(n1ql-query-index-hangs.zip).

            using Ubuntu14.04, 3 node cluster and all services enabled on all nodes.

            On one of the nodes I am observing this,

            2016-05-16T11:35:08.82+00:00 [Info] SCAN##3 REQUEST defnId:16030325685263847945, index:default/iQA, type:scan, span:range (["Hello","World!"],["Hello","World!"] incl:both), limit:9223372036854775807, consistency:session_consistency, requestId:68cba256-87c4-469a-a1a3-a422097c7335
            2016-05-16T11:35:08.82+00:00 [Info] SCAN##3 RESPONSE status:(error = Index scan timed out), requestId: 68cba256-87c4-469a-a1a3-a422097c7335

            I am also attaching the indexer.log.

            prataprc Pratap Chakravarthy (Inactive) added a comment - - edited With 4.1.0 Build 5008 I can reproduce this issue with the attached script(n1ql-query-index-hangs.zip). using Ubuntu14.04, 3 node cluster and all services enabled on all nodes. On one of the nodes I am observing this, 2016-05-16T11:35:08.82+00:00 [Info] SCAN##3 REQUEST defnId:16030325685263847945, index:default/iQA, type:scan, span:range ( ["Hello","World!"] , ["Hello","World!"] incl:both), limit:9223372036854775807, consistency:session_consistency, requestId:68cba256-87c4-469a-a1a3-a422097c7335 2016-05-16T11:35:08.82+00:00 [Info] SCAN##3 RESPONSE status:(error = Index scan timed out), requestId: 68cba256-87c4-469a-a1a3-a422097c7335 I am also attaching the indexer.log.

            Modified the script to do another RP (request plus query) query after the first one.

            9. RP-query-1 result:

            { "elapsedTime": "2m0.104445894s", "executionTime": "2m0.104399094s", "resultCount": 1, "resultSize": 31 }

            10. RP-query-2 result:

            { "elapsedTime": "6.586935ms", "executionTime": "6.541229ms", "resultCount": 1, "resultSize": 31 }

            the first RP query timesout but the second one succeeds.

            prataprc Pratap Chakravarthy (Inactive) added a comment - Modified the script to do another RP (request plus query) query after the first one. 9. RP-query-1 result: { "elapsedTime": "2m0.104445894s", "executionTime": "2m0.104399094s", "resultCount": 1, "resultSize": 31 } 10. RP-query-2 result: { "elapsedTime": "6.586935ms", "executionTime": "6.541229ms", "resultCount": 1, "resultSize": 31 } the first RP query timesout but the second one succeeds.

            that indeed looks like the issue john, keshav and I were seeing back then!

            simonbasle Simon Baslé (Inactive) added a comment - that indeed looks like the issue john, keshav and I were seeing back then!

            The issue is not reproducible in Spock build. Trying running repro script in one node setup as well as a cluster with 3 nodes with all services enabled. 

            The scan vector computation moved to server side as part of commit https://github.com/couchbase/indexing/commit/bf687244dd312bf022e5a6937ab99716506adb64 and hence the timeout issue is mitigated after bucket flush.

             

            prathibha Prathibha Bisarahalli (Inactive) added a comment - The issue is not reproducible in Spock build. Trying running repro script in one node setup as well as a cluster with 3 nodes with all services enabled.  The scan vector computation moved to server side as part of commit https://github.com/couchbase/indexing/commit/bf687244dd312bf022e5a6937ab99716506adb64  and hence the timeout issue is mitigated after bucket flush.  

            Bulk closing invalid, won't-fix, duplicate bugs. Please feel free to reopen

            raju Raju Suravarjjala added a comment - Bulk closing invalid, won't-fix, duplicate bugs. Please feel free to reopen

            People

              prathibha Prathibha Bisarahalli (Inactive)
              jliang John Liang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty