Uploaded image for project: 'Couchbase .NET client library'
  1. Couchbase .NET client library
  2. NCBC-2451

QueryPreparedStatementFailure in situational tests with MH

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0-beta.4
    • 3.0.1
    • library
    • 1

    Description

      Seeing an error from the SDK under this Query scenario, with MH only

      Operations: Continuous n1ql queries

      Change: Add a n1ql node in and remove another and rebalance

      Cluster:

      1. kv
      2. n1ql <- not in the cluster at the start, then rebalanced in
      3. index
      4. n1ql <- rebalanced out

      Error: All queries will timeout after the change. In the logs it looks like the error is because of QueryPreparedStatementFailure.

      Graph results: http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.0-4960/Rb1SwapQuery-N1QL/04-01-20/039991/77aeb3151664b7556af70fd27258c32d-N1QL.html

       

      Logs: SdkdConsole.log.zip . and full run logs http://sdkbuilds.sc.couchbase.com/view/.NET/job/sdk-net-situational-release/job/SDK3.0-Release/job/net-centos-madhatter/7/console (not very informative in this case)

      Note: There is also a failure with the service restart query also due to the QueryPreparedStatementFailure, as seen on the graph http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.0-4960/SvcRestartQuery-N1QL/04-01-20/042644/8ba514235eb2fc2b9342b79c90b95b1a-N1QL.html

      Im attaching further logs for this test too, although I think you'll find they show the same issue SdkdConsole-SvcRst.log.zip

      Attachments

        1. SdkdConsole.log.zip
          437 kB
        2. SdkdConsole.log 2.zip
          176 kB
        3. SdkdConsole3.log.zip
          185 kB
        4. SdkdConsole-SvcRst.log.zip
          379 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            jmorris Jeff Morris added a comment -

            I am pretty sure these are the same issues, one ticket is to fix the issue the other is because QE found the result of the issue.

            jmorris Jeff Morris added a comment - I am pretty sure these are the same issues, one ticket is to fix the issue the other is because QE found the result of the issue.
            will.broadbelt Will Broadbelt added a comment - Jeff Morris  - Still seeing some issues with this, post implementation. Graph output:  http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.1-6292/Rb1SwapQuery-N1QL/05-05-20/063279/3edfff336362c235df79cb8da8371fea-N1QL.html Sdk logs:  SdkdConsole.log 2.zip
            will.broadbelt Will Broadbelt added a comment - - edited

            Jeff Morris -

            New logs with improved logging: SdkdConsole3.log.zip

            will.broadbelt Will Broadbelt added a comment - - edited Jeff Morris  - New logs with improved logging:  SdkdConsole3.log.zip
            jmorris Jeff Morris added a comment -

            This is being returned back to the caller with a NoRetry state instead of running through the "fast prepare logic":

            Line 8048: 2020-05-07 12:12:51.3845||Couchbase.Query.QueryClient|DEBUG|The request 12376c35-e1ed-44e4-9270-8c4c15549434 failed because: No such prepared statement: 082b1291-5786-513e-a88c-576735e92ebc [4040]

            I suspect this is a regression caused by refactoring the error-handling some time back.

            jmorris Jeff Morris added a comment - This is being returned back to the caller with a NoRetry state instead of running through the "fast prepare logic": Line 8048: 2020-05-07 12:12:51.3845||Couchbase.Query.QueryClient|DEBUG|The request 12376c35-e1ed-44e4-9270-8c4c15549434 failed because: No such prepared statement: 082b1291-5786-513e-a88c-576735e92ebc [4040] I suspect this is a regression caused by refactoring the error-handling some time back.
            jmorris Jeff Morris added a comment -

            The client is using the stale query plan of the node that was rebalanced out (10.143.194.104 on my cluster):

            {"prepared":"[10.143.194.104:8091]e9e06c57-f950-5498-9020-f5a573cde32f","timeout":"75000ms","client_context_id":"b55e4615-6e35-4ef5-b917-f1ea88d076e2"}
            

            The new query uri of the rebalanced in node (10.143.194.102 on my cluster):

            http://10.143.194.102:8093/query
            

            And the result is the that the cluster cannot find the stale statement and fails with a 4040:

            4040 - No such prepared statement: e9e06c57-f950-5498-9020-f5a573cde32f
            

            Which is the retried until it times out or exceeds the retry quota.

            jmorris Jeff Morris added a comment - The client is using the stale query plan of the node that was rebalanced out (10.143.194.104 on my cluster): {"prepared":"[10.143.194.104:8091]e9e06c57-f950-5498-9020-f5a573cde32f","timeout":"75000ms","client_context_id":"b55e4615-6e35-4ef5-b917-f1ea88d076e2"} The new query uri of the rebalanced in node (10.143.194.102 on my cluster): http://10.143.194.102:8093/query And the result is the that the cluster cannot find the stale statement and fails with a 4040: 4040 - No such prepared statement: e9e06c57-f950-5498-9020-f5a573cde32f Which is the retried until it times out or exceeds the retry quota.
            jmorris Jeff Morris added a comment - - edited

            Fixed: detect the 4040, purge the cache of the old plan, retry, re-create the plan and cache. success:

            Line 4475: 2020-05-08T18:17:13.8547897-07:00  [DBG] Request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" has failed because Fatal. (901000b3)
            	Line 4476: 2020-05-08T18:17:13.8555159-07:00  [DBG] Query plan is stale for "4d8663b1-b87d-45ee-b8b5-01e11682c69d". Purging plan "[10.143.194.104:8091]e9e06c57-f950-5498-9020-f5a573cde32f". (cacc2905)
            	Line 4477: 2020-05-08T18:17:13.8555282-07:00  [DBG] Request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" is being retried. (706a8c5b)
            	Line 4481: 2020-05-08T18:17:13.8595152-07:00  [DBG] Using enhanced prepared statement behavior for request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" (9c6686e7)
            	Line 4482: 2020-05-08T18:17:13.9904492-07:00  [DBG] Request 4d8663b1-b87d-45ee-b8b5-01e11682c69d has succeeded. (00e52722)
            

            jmorris Jeff Morris added a comment - - edited Fixed: detect the 4040, purge the cache of the old plan, retry, re-create the plan and cache. success: Line 4475: 2020-05-08T18:17:13.8547897-07:00 [DBG] Request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" has failed because Fatal. (901000b3) Line 4476: 2020-05-08T18:17:13.8555159-07:00 [DBG] Query plan is stale for "4d8663b1-b87d-45ee-b8b5-01e11682c69d". Purging plan "[10.143.194.104:8091]e9e06c57-f950-5498-9020-f5a573cde32f". (cacc2905) Line 4477: 2020-05-08T18:17:13.8555282-07:00 [DBG] Request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" is being retried. (706a8c5b) Line 4481: 2020-05-08T18:17:13.8595152-07:00 [DBG] Using enhanced prepared statement behavior for request "4d8663b1-b87d-45ee-b8b5-01e11682c69d" (9c6686e7) Line 4482: 2020-05-08T18:17:13.9904492-07:00 [DBG] Request 4d8663b1-b87d-45ee-b8b5-01e11682c69d has succeeded. (00e52722)
            jmorris Jeff Morris added a comment -

            Will Broadbelt - try testing with this gerrit review: http://review.couchbase.org/c/couchbase-net-client/+/127479

            It should have fixed the issue

            jmorris Jeff Morris added a comment - Will Broadbelt - try testing with this gerrit review: http://review.couchbase.org/c/couchbase-net-client/+/127479 It should have fixed the issue

            Jeff Morris - Yep, works a treat! 

            will.broadbelt Will Broadbelt added a comment - Jeff Morris  - Yep, works a treat! 

            People

              will.broadbelt Will Broadbelt
              will.broadbelt Will Broadbelt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty