Uploaded image for project: 'Distributed Transactions Java'
  1. Distributed Transactions Java
  2. TXNJ-61

YCSB: DocumentAlreadyInTransaction exceptions while running concurrent transactions with Durability Level None

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not a Bug
    • 1.0.0-alpha.4
    • None
    • None

    Description

      Getting "com.couchbase.transactions.error.attempts.DocumentAlreadyInTransaction " exceptions while running concurrent transactions with durability level NONE 

      Detailed log attached - worker_172.23.97.251.log_dur_level_none.zip

      Error Snippet :

      Transaction logger:24/Thread-3/03f38a13-e728-4989-8dee-aa2115a16f7c/345d0f15-7aa2-4b88-9c80-ef41aa57737f caught exception 'com.couchbase.transactions.error.attempts.DocumentAlreadyInTransaction: Document usertable:user4441073806199749893 is already in a transaction' in asyncInternal, rethrowing to rollback

      Transaction logger:24/Thread-3/03f38a13-e728-4989-8dee-aa2115a16f7c/345d0f15-7aa2-4b88-9c80-ef41aa57737f com.couchbase.transactions.AttemptContextReactive.checkAndHandleBlockingTxn(AttemptContextReactive.java:849)

      com.couchbase.transactions.AttemptContextReactive.lambda$null$17(AttemptContextReactive.java:401)

      reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)

      reactor.core.publisher.MonoPeek.subscribe(MonoPeek.java:71)

      reactor.core.publisher.MonoDoFinally.subscribe(MonoDoFinally.java:47)

      reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)

      reactor.core.publisher.Mono.block(Mono.java:1473)

      com.couchbase.transactions.AttemptContext.replace(AttemptContext.java:104)

      com.yahoo.ycsb.db.couchbase3.Couchbase3Client.lambda$transactionContext$0(Couchbase3Client.java:194)

      com.couchbase.transactions.TransactionsReactive.lambda$null$17(TransactionsReactive.java:356)

      reactor.core.publisher.MonoRunnable.subscribe(MonoRunnable.java:40)

      reactor.core.publisher.MonoOnErrorResume.subscribe(MonoOnErrorResume.java:44)

      reactor.core.publisher.MonoPeek.subscribe(MonoPeek.java:71)

      reactor.core.publisher.MonoPeek.subscribe(MonoPeek.java:71)

      reactor.core.publisher.Mono.subscribe(Mono.java:3589)

      reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)

      reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)

      reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)

      reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153)

      reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190)

      reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240)

      reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1478)

      reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:144)

      reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:192)

      reactor.core.publisher.FluxSubscribeOnValue$ScheduledScalar.run(FluxSubscribeOnValue.java:178)

      reactor.core.scheduler.ElasticScheduler$DirectScheduleTask.run(ElasticScheduler.java:292)

      reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:50)

      reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:27)

      java.util.concurrent.FutureTask.run(FutureTask.java:266)

      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

      java.lang.Thread.run(Thread.java:748)

      Attachments

        1. ycsb_alpha5_snapshot_error.log
          957 kB
          Sharath Sulochana

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Chiming in here, we certainly discussed (cc Shivani Gupta, Dave Finlay, John Liang, Ravi Mayuram) during design approach that the target here is one in which there is not a lot of contention on the same document involved in a transaction. Even in traditional RDBMS (or for that matter OO software design), breaking up this kind of contention is normal if you want to scale.

            I do know that YCSB uses zipfian distribution and there are some tunables there. I also know from other testing that Michael Nitschinger and I did back in the early days of YCSB at Couchbase, that it's problematic[1]. Michael had a good writeup based on profiling about zipfian.

            In any case, I think since the design target is one in which we don't have a lot of overlapping documents, if the OOTB YCSB is showing, as I seem to see above, 31,000 ops on a single document in the set, we need to relax zipfian or just use an even distribution. That's the design target with the feature.

            1. Slide 18 in this GDrive doc

            ingenthr Matt Ingenthron added a comment - Chiming in here, we certainly discussed (cc Shivani Gupta , Dave Finlay , John Liang , Ravi Mayuram ) during design approach that the target here is one in which there is not a lot of contention on the same document involved in a transaction. Even in traditional RDBMS (or for that matter OO software design), breaking up this kind of contention is normal if you want to scale. I do know that YCSB uses zipfian distribution and there are some tunables there. I also know from other testing that Michael Nitschinger and I did back in the early days of YCSB at Couchbase, that it's problematic [1] . Michael had a good writeup based on profiling about zipfian. In any case, I think since the design target is one in which we don't have a lot of overlapping documents, if the OOTB YCSB is showing, as I seem to see above, 31,000 ops on a single document in the set, we need to relax zipfian or just use an even distribution. That's the design target with the feature. 1. Slide 18 in this GDrive doc

            Shivani Gupta Matt Ingenthron - Thanks for the clarification . 

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - Shivani Gupta Matt Ingenthron - Thanks for the clarification . 

            Graham Pople 

            Recent runs with Uniform requestdistribution looks pretty clean without any errors in performance cluster . We can actually close this ticket unless if you want to keep it open as reference to TXNJ-90  or any other investigation  . 

            Throughput achieved ~5375 transactions/sec with 90% cpu utilization while durability level is set to none  . I will run some additional tests before sharing the perf numbers  . 

            Test :  workload_Instance 6, workers 40 , transaction distribution - 4 docs , 75% Read, 25% Update , Durabilitylevel None , TestDur 1200 secs

            Failures - 0

            Throughput : 5375 transactions/sec

            CB CPU utilization: ~90 % 

            http://perf.jenkins.couchbase.com/job/hebe-txn/16/

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - Graham Pople   Recent runs with Uniform   requestdistribution looks pretty clean without any errors in performance cluster . We can actually close this ticket unless if you want to keep it open as reference to  TXNJ-90   or any other investigation  .  Throughput achieved ~5375 transactions/sec with 90% cpu utilization while durability level is set to none  . I will run some additional tests before sharing the perf numbers  .  Test :   workload_Instance 6, workers 40 , transaction distribution - 4 docs , 75% Read, 25% Update , Durabilitylevel None , TestDur 1200 secs Failures - 0 Throughput : 5375 transactions/sec CB CPU utilization: ~90 %  http://perf.jenkins.couchbase.com/job/hebe-txn/16/
            graham.pople Graham Pople added a comment -

            Sharath Sulochana that's excellent news, thanks for sharing.  So we've gone from ~500 transactions per sec to >5000?  Fantastic.  I'll look forward to the full figures including with durability enabled.

            One thing I'm very curious about is, what is the current amount of contention in your tests - e.g. how many transactions are retrying due to write-write conflicts, and how many retries are they doing.  You could check this by looking at how many attempts there are in the 'attempts' field in the returned 'TransactionResult'.  Perhaps % of contention could be expressed as (number of retry attempts / number of total attempts).

            I'll close this ticket out since we've addressed the issue here.  Is there a more generic 'transactions performance testing' ticket we can use for further discussion?

            graham.pople Graham Pople added a comment - Sharath Sulochana that's excellent news, thanks for sharing.  So we've gone from ~500 transactions per sec to >5000?  Fantastic.  I'll look forward to the full figures including with durability enabled. One thing I'm very curious about is, what is the current amount of contention in your tests - e.g. how many transactions are retrying due to write-write conflicts, and how many retries are they doing.  You could check this by looking at how many attempts there are in the 'attempts' field in the returned 'TransactionResult'.  Perhaps % of contention could be expressed as (number of retry attempts / number of total attempts). I'll close this ticket out since we've addressed the issue here.  Is there a more generic 'transactions performance testing' ticket we can use for further discussion?

            Closing this issue 

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - Closing this issue 

            People

              graham.pople Graham Pople
              sharath.sulochana Sharath Sulochana (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty