CompletionException intermittently seen from transactions
Description
Environment
Gerrit Reviews
Release Notes Description
Activity

Graham Pople August 12, 2022 at 9:43 AM
From static inspection it's definitely coming from cleanup.
noSerializationContentInClientRecords has created a new cluster connection, but AFAICT there's no way for it to have disconnected it while transactions is running. It has just called transactions.close().
Can replicate almost immediately with this simple code:
Hooks.onErrorDropped(err -> {
throw new RuntimeException(err);
});
while (true) {
Transactions transactions = Transactions.create(cluster, TransactionConfigBuilder.create()
.cleanupWindow(Duration.ofSeconds(1)));
Thread.sleep(ThreadLocalRandom.current().nextInt(1000));
transactions.close();
}
The equivalent code in the integrated client is fine, however.
Actually, it happens in non-integrated even without the transactions.close(), if I leave it running. I think it may therefore be an issue with extremely tight cleanup windows, causing ATR fetches to overlap, and one erroring with "atr not found" causing issues with the in-flight overlapping ones. Possibly introduced in .
This would imply that it can happen in integrated client, which shares this part of the code.. And sure enough, it can, if I set the cleanup window down to 200ms.
Fixed with MonoBridge.

Graham Pople August 2, 2022 at 3:59 PM
Full trace:
23:12:51.727 [grpc-default-executor-5] INFO com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.testDeserializedInitialContent():testDeserializedInitialContent() : testEnd ================
23:12:51.728 [grpc-default-executor-5] INFO com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.noSerializationContentInClientRecords():noSerializationContentInClientRecords() : testStart ================
23:12:51.729 [grpc-default-executor-5] INFO com.couchbase.PerformerService - Using version : v1_2_0
23:12:51.729 [grpc-default-executor-5] INFO com.couchbase.PerformerService - Performer implements protocol 2.1 with caps [EXT_DEFERRED_COMMIT, EXT_TRANSACTION_ID, EXT_TIME_OPT_UNSTAGING, EXT_SERIALIZATION, EXT_TRANSACTION_ID, EXT_DEFERRED_COMMIT, EXT_TIME_OPT_UNSTAGING, EXT_BINARY_METADATA, EXT_CUSTOM_METADATA_COLLECTION, EXT_QUERY, EXT_STORE_DURABILITY, BF_CBD_3838, BF_CBD_3787, BF_CBD_3705, BF_CBD_3794, EXT_REMOVE_COMPLETED, EXT_ALL_KV_COMBINATIONS, EXT_UNKNOWN_ATR_STATES, BF_CBD_3791, EXT_SINGLE_QUERY, EXT_THREAD_SAFE, EXT_SERIALIZATION, EXT_MOBILE_INTEROP]
23:12:51.789 [grpc-default-executor-5] INFO com.couchbase.PerformerService - Established connection to cluster at IP: 172.23.111.131 with user Administrator
23:12:51.789 [grpc-default-executor-5] INFO com.couchbase.PerformerService - For this user, we assigned clusterConnectionId:bda2428d-ac40-4bea-a22b-238accb9744a
23:12:51.798 [grpc-default-executor-5] INFO com.couchbase.PerformerService - Creating new Transactions factory 623bd20c-cc48-4873-85b8-5920bb180db4 using cluster connection id bda2428d-ac40-4bea-a22b-238accb9744a (user Administrator), now have 3
23:12:51.832 [cb-events] INFO com.couchbase.core - [com.couchbase.core][CoreCreatedEvent] {"clientVersion":null,"clientGitHash":null,"coreVersion":null,"coreGitHash":null,"userAgent":"couchbase-java/0.0.0 (Linux 4.15.0-91-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b12)","maxNumRequestsInRetry":32768,"ioEnvironment":{"nativeIoEnabled":true,"eventLoopThreadCount":3,"eventLoopGroups":["EpollEventLoopGroup"]},"ioConfig":{"captureTraffic":[],"mutationTokensEnabled":true,"networkResolution":"auto","dnsSrvEnabled":true,"tcpKeepAlivesEnabled":true,"tcpKeepAliveTimeMs":60000,"configPollIntervalMs":2500,"kvCircuitBreakerConfig":"disabled","queryCircuitBreakerConfig":"disabled","viewCircuitBreakerConfig":"disabled","searchCircuitBreakerConfig":"disabled","analyticsCircuitBreakerConfig":"disabled","managerCircuitBreakerConfig":"disabled","eventingCircuitBreakerConfig":"disabled","backupCircuitBreakerConfig":"disabled","numKvConnections":1,"maxHttpConnections":12,"idleHttpConnectionTimeoutMs":4500,"configIdleRedialTimeoutMs":300000,"memcachedHashingStrategy":"StandardMemcachedHashingStrategy"},"compressionConfig":{"enabled":true,"minRatio":0.83,"minSize":32},"securityConfig":{"tlsEnabled":false,"nativeTlsEnabled":true,"hostnameVerificationEnabled":true,"trustCertificates":null,"trustManagerFactory":null,"ciphers":[]},"timeoutConfig":{"kvMs":2500,"kvDurableMs":10000,"managementMs":75000,"queryMs":75000,"viewMs":75000,"searchMs":75000,"analyticsMs":75000,"connectMs":10000,"disconnectMs":10000,"eventingMs":75000,"backupMs":75000},"loggerConfig":{"customLogger":null,"fallbackToConsole":false,"consoleLogLevel":{"name":"INFO","resourceBundleName":"sun.util.logging.resources.logging","localizedName":"INFO"},"disableSlf4j":false,"loggerName":"CouchbaseLogger","diagnosticContextEnabled":false},"orphanReporterConfig":{"emitIntervalMs":10000,"sampleSize":10,"queueLength":1024,"enabled":true},"thresholdLoggingTracerConfig":{"enabled":true,"emitIntervalMs":10000,"sampleSize":10,"queueLength":1024,"kvThresholdMs":500,"queryThresholdMs":1000,"searchThresholdMs":1000,"analyticsThresholdMs":1000,"viewThresholdMs":1000},"loggingMeterConfig":{"enabled":true,"emitIntervalMs":600000},"retryStrategy":"BestEffortRetryStrategy","requestTracer":"ThresholdLoggingTracer","meter":"LoggingMeter","numRequestCallbacks":0,"scheduler":"ParallelScheduler","schedulerThreadCount":6,"transactionsConfig":{"durabilityLevel":"MAJORITY","timeoutMs":15000,"cleanupConfig":{"runLostAttemptsCleanupThread":true,"runRegularAttemptsCleanupThread":true,"cleanupWindowMs":60000,"cleanupSet":""},"numAtrs":1024,"metadataCollection":"none","scanConsistency":"none"}} {"coreId":"0x9458e5e20000004e","seedNodes":[{"address":"172.23.111.131"}]}
23:12:51.832 [cb-events] INFO c.c.transactions.cleanup.regular - [com.couchbase.transactions.cleanup.regular][LogEvent] Starting background cleanup thread to find transactions from this client
23:12:51.832 [cb-events] INFO com.couchbase.transactions - [com.couchbase.transactions][TransactionsStartedEvent] Transactions successfully initialised, regular cleanup enabled=true, lost cleanup enabled=true
23:12:51.832 [cb-events] INFO com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.131"}
23:12:51.832 [cb-events] INFO com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.128"}
23:12:51.832 [cb-events] INFO com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.129"}
23:12:51.832 [cb-events] INFO com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.130"}
23:12:51.832 [cb-events] WARN com.couchbase.transactions - [com.couchbase.transactions][TransactionLogEvent] Only one Transactions object should be created per application, but have found 3
23:12:51.832 [cb-events] INFO com.couchbase.transactions.cleanup - [com.couchbase.transactions.cleanup][TransactionLogEvent] Cleanup settings; regular cleanup thread enabled=false, lost cleanup thread enabled=true
23:12:51.832 [cb-events] INFO com.couchbase.transactions - [com.couchbase.transactions][TransactionsStarted] Transactions successfully started, regular cleanup enabled=false, lost cleanup enabled=true
23:12:52.039 [cb-events] DEBUG c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa found 4 buckets
23:12:52.039 [cb-events] INFO c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket ThreadSafetyATRTestbucket
23:12:52.039 [cb-events] INFO c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket default
23:12:52.039 [cb-events] INFO c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket ThreadSafetyMultipleCollectionsTestbucket
23:12:52.039 [cb-events] INFO c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket secondTestBucket
23:12:52.039 [cb-events] INFO com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][50ms] Opened bucket "ThreadSafetyATRTestbucket" {"coreId":"0x9458e5e20000004e"}
23:12:52.039 [cb-events] INFO com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][80ms] Opened bucket "default" {"coreId":"0x9458e5e20000004e"}
23:12:52.039 [cb-events] INFO com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][80ms] Opened bucket "secondTestBucket" {"coreId":"0x9458e5e20000004e"}
23:12:52.039 [cb-events] INFO com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][83ms] Opened bucket "ThreadSafetyMultipleCollectionsTestbucket" {"coreId":"0x9458e5e20000004e"}
23:12:52.240 [cb-events] DEBUG c.c.transactions.clientrecord - [com.couchbase.transactions.clientrecord][TransactionLogEvent] secondTestBucket/_default/226fa0b8-2672-4917-96e3-5d4a54d42ef6 found 1 existing clients including this (1 active, 0 expired), included this=false, index of this=0, override={enabled=false,expires=0,now=1659395572000000000,active=false}
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletionException: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"serverMicros":0}}
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:604)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.msg.BaseRequest.succeed(BaseRequest.java:161)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decodeAndComplete(KeyValueMessageHandler.java:387)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decode(KeyValueMessageHandler.java:366)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.channelRead(KeyValueMessageHandler.java:282)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:85)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:314)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:435)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
23:12:52.267 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:487)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:385)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
23:12:52.268 [cb-io-kv-313-2] WARN com.couchbase.PerformerService - java.lang.Thread.run(Thread.java:748)
23:12:52.287 [grpc-default-executor-5] INFO com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.noSerializationContentInClientRecords():noSerializationContentInClientRecords() : testEnd ================
23:12:52.289 [grpc-default-executor-5] INFO com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.committedReplacedSerialized():committedReplacedSerialized() : testStart ================
23:12:52.291 [grpc-default-executor-5] INFO com.couchbase.PerformerService - Creating new transaction with cluster connection 816f048b-b0dd-40da-89ab-94b5269d0ea7 (user Administrator) and transactions factory 3d4607d5-a098-4523-94c7-0407237b65c6
I think these happen when we fire off a request to get a document, but before the response is returned, we've stopped listening. My initial guess is this is a shutdown issue.
Details
Details
Assignee

Reporter

Story Points
Sprint
Fix versions
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support

Performer specific validation failure: Hook dropped (raised async so could have been in an earlier test): java.util.concurrent.CompletionException: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"serverMicros":0}} cause: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"totalMicros":3967,"serverMicros":0}}
http://sdk.jenkins.couchbase.com/job/jvm/job/transactions/job/TransactionsFITPerformer_SinglePerformer/lastCompletedBuild/testReport/com.couchbase.transactions/SerializersTest/Setup_test_Infra_and_Execute_Tests__7_2_stable_Vs_Java_1_2_4__committedReplacedSerialized/
Not sure what's going on yet.