Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34348

RequestTimeoutException while setting ATR to Pending

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      Getting RequestTimeoutException exception while setting ATR to pending status while running load test with 120 concurrent threads/requests . Below is the error snippet for one of the transaction .

      Please take a look at the log attached for more details as there few more exceptions related to  TemporaryFailureExceptionDocumentAlreadyInTransaction

      (I was not able to comprehend if these these are inter related MB-34347) 

      CB Build: 6.5.0-3274

      SDK : 3.0.0-alpha.3

      Couchbase-transactions : 1.0.0-alpha.4

       

      Transaction error log :

      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe starting attempt 0/dec1be32-3c3d-4115-9866-7dc74274ccd5
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 getting doc usertable:user707278906139368243
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 completed get of TransactionJsonDocument{id=usertable:user707278906139368243,cas=1558659019246469120,status=NORMAL,bucket=bucket-1,coll=_default,links=TransactionLinks{atr=none,atrBkt=none,atrColl=none,ver=none}} in 32ms
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 replace doc TransactionJsonDocument{id=usertable:user707278906139368243,cas=1558659019246469120,status=NORMAL,bucket=bucket-1,coll=_default,links=TransactionLinks{atr=none,atrBkt=none,atrColl=none,ver=none}}
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 First mutated doc in txn is 'usertable:user707278906139368243' on vbucket 628, so using atr atr-628-#21b
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 about to set ATR bucket-1/_default/atr-628-#21b to Pending
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 error com.couchbase.client.core.error.RequestTimeoutException: SubdocMutateRequest {"retried":0,"reason":"TIMEOUT","requestId":170,"timeoutMs":2500,"service":

      {"bucket":"bucket-1","type":"kv","key":"atr-628-#21b","cid":0}

      ,"cancelled":true,"coreId":15,"completed":true} while setting ATR bucket-1/_default/atr-628-#21b to Pending
      [DEBUG] (cb-timer-12-1) Stopping retries since predicate returned false, retry context: iteration=1 exception=com.couchbase.client.core.error.RequestTimeoutException: SubdocGetRequest {"retried":13,"reason":"TIMEOUT","requestId":429,"timeoutMs":2500,"service":

      {"bucket":"bucket-1","type":"kv","key":"atr-681-#178","cid":0}

      ,"cancelled":true,"coreId":13,"completed":true} backoff={11ms/250ms}
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 caught exception 'com.couchbase.client.core.error.RequestTimeoutException: SubdocMutateRequest {"retried":0,"reason":"TIMEOUT","requestId":170,"timeoutMs":2500,"service":

      {"bucket":"bucket-1","type":"kv","key":"atr-628-#21b","cid":0}

      ,"cancelled":true,"coreId":15,"completed":true}' in asyncInternal, rethrowing to rollback
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 com.couchbase.client.core.msg.BaseRequest.cancel(BaseRequest.java:131)
      com.couchbase.client.core.Timer.lambda$register$1(Timer.java:96)
      com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
      com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)
      com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485)
      com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      java.lang.Thread.run(Thread.java:748)
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/dec1be32-3c3d-4115-9866-7dc74274ccd5 exception RequestTimeoutException needs to wrapped as AttemptWrappedExceptionNeedsRetry
      Transaction logger:32/Thread-11/87853366-55a6-47e6-9491-a75418b3dabe/<> hit error TransactionWrappedExceptionNeedsRetry(com.couchbase.client.core.error.RequestTimeoutException: SubdocMutateRequest {"retried":0,"reason":"TIMEOUT","requestId":170,"timeoutMs":2500,"service":

      {"bucket":"bucket-1","type":"kv","key":"atr-628-#21b","cid":0}

      ,"cancelled":true,"coreId":15,"completed":true})
      [DEBUG] (elastic-42) Scheduling retry attempt, retry context: iteration=2 exception=TransactionWrappedExceptionNeedsRetry(com.couchbase.client.core.error.RequestTimeoutException: SubdocGetRequest {"retried":13,"reason":"TIMEOUT","requestId":433,"timeoutMs":2500,"service":

      {"bucket":"bucket-1","type":"kv","key":"usertable:user3426306227801922038","cid":0}

      ,"cancelled":true,"coreId":7,"completed":true}) backoff={18ms/250ms}

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          https://issues.couchbase.com/browse/TXNJ-46 - We see DocumentAlreadyInTransaction Error, when the first transaction times out and we create another one transaction for the same set of keys, then the second transaction is failing with "DocumentAlreadyInTransaction" Error. Graham is analysing this issue. Not sure if you are also hitting the same error.

           

          anitha.kuberan Anitha Kuberan added a comment - https://issues.couchbase.com/browse/TXNJ-46  - We see  DocumentAlreadyInTransaction Error, when the first transaction times out and we create another one transaction for the same set of keys, then the second transaction is failing with " DocumentAlreadyInTransaction" Error. Graham is analysing this issue. Not sure if you are also hitting the same error.  
          graham.pople Graham Pople added a comment -

          Will take a look at this today, currently investigating another issue.

          graham.pople Graham Pople added a comment - Will take a look at this today, currently investigating another issue.
          graham.pople Graham Pople added a comment -

          From the transactions library point of view, this is the same as TXNJ-55 - the only thing different is it's a different transient error from the server.  The logs show we:

          1. Start a txn
          2. Try to create ATR entry at PENDING.
          3. This hits a transient server error.  In TXNJ-55 it was a TempFail, here it's a RequestTimeout.
          4. (As a sidenote, I saw another log earlier showing a MAJORITY write at 2.2 secs.  So it looks like MAJORITY writes can be, at present and possibly just in some situations, pretty close to the 2.5 second default timeout.  And it's likely that that's causing the RequestTimeout here.  These slow writes need to be looked into further, but since this issue was raised with me, I'll keep this ticket focussed on the txn library's handling of these errors.)
          5. The txn library tries to rollback the attempt and try again.
          6. This rollback starts with an attempt to set the ATR entry to ABORTED.
          7. This also fails with a RequestTimeout.
          8. As in TXNJ-55, this causes the transaction library to fail fast.  This isn't a bug per-se, but the library could be somewhat more robust to this situation.

          I'll close as a dup.

          graham.pople Graham Pople added a comment - From the transactions library point of view, this is the same as TXNJ-55 - the only thing different is it's a different transient error from the server.  The logs show we: Start a txn Try to create ATR entry at PENDING. This hits a transient server error.  In TXNJ-55 it was a TempFail, here it's a RequestTimeout. (As a sidenote, I saw another log earlier showing a MAJORITY write at 2.2 secs.  So it looks like MAJORITY writes can be, at present and possibly just in some situations, pretty close to the 2.5 second default timeout.  And it's likely that that's causing the RequestTimeout here.  These slow writes need to be looked into further, but since this issue was raised with me, I'll keep this ticket focussed on the txn library's handling of these errors.) The txn library tries to rollback the attempt and try again. This rollback starts with an attempt to set the ATR entry to ABORTED. This also fails with a RequestTimeout. As in TXNJ-55, this causes the transaction library to fail fast.  This isn't a bug per-se, but the library could be somewhat more robust to this situation. I'll close as a dup.

          People

            graham.pople Graham Pople
            sharath.sulochana Sharath Sulochana (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty