Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52303

Query transaction error changed from 7.1 to 7.1.1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 7.1.1
    • 7.1.1
    • query
    • None
    • Untriaged
    • 1
    • Yes

    Description

      (I've raised as a bug, but it's unclear at this time whether this is a bug or an intentional improvement.  However it is a behaviour change that's affecting our tests so I felt I should raise it.)

      We have a test that simulates a write-write conflict error, where T2 is permanently blocked from writing a document that is locked by T1, leading to T2 timing out.

      On 7.1 this lead to this response:

      {
         "errors":[
            {
               "code":1080,
               "message":"Timeout 1.9999705s exceeded",
               "retry":true
            },
            {
               "additional":{
                  "cause":{
                     "cause":{
                        "bucket":"default",
                        "cause":"deadline expired before WWC was resolved on default._default._default._txn:atr-581-#2c7",
                        "collection":"_default",
                        "document_key":"17e138ca-b214-41b0-a5cb-a6d99ceed7d0",
                        "msg":"write write conflict",
                        "scope":"_default"
                     },
                     "raise":"failed",
                     "retry":true,
                     "rollback":false
                  }
               },
               "code":17020,
               "message":"Transaction staging error",
               "retry":false
            }
         ]
      } 

      But in 7.1.1-3070 we get this response (note the first 1080 cause is missing):

      {
         "errors":[
            {
               "cause":{
                  "cause":{
                     "bucket":"default",
                     "cause":"deadline expired before WWC was resolved on default._default._default._txn:atr-239-#fa9",
                     "collection":"_default",
                     "document_key":"710007f4-0f69-4ab0-a12d-b501da93bfa4",
                     "msg":"write write conflict",
                     "scope":"_default"
                  },
                  "raise":"failed",
                  "retry":true,
                  "rollback":false
               },
               "code":17020,
               "msg":"Transaction staging error"
            }
         ]
      } 

      At this stage I'm just trying to find out if this change is intentional?

      Update: on some further investigation, the 7.1.1 behaviour seems to be the correct one.  Under the hood, gocbcore is doing a 1 second polling loop waiting for T2 to complete.  The transaction has been given 2 seconds in total to complete.  So it's unclear why on 7.1 we get that 1080 error saying the 2 second timeout has expired - because it should have only taken 1 second inside gocbcore.  So, perhaps we can just close this out, as it's a clear improvement.  Though it would be good to understand what was going wrong on 7.1, e.g. exactly where the fix happened (could be gocbcore or query).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            kamini.jagtiani Kamini Jagtiani
            graham.pople Graham Pople
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty