CompletionException intermittently seen from transactions

Description

Performer specific validation failure: Hook dropped (raised async so could have been in an earlier test): java.util.concurrent.CompletionException: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"serverMicros":0}} cause: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"totalMicros":3967,"serverMicros":0}}

http://sdk.jenkins.couchbase.com/job/jvm/job/transactions/job/TransactionsFITPerformer_SinglePerformer/lastCompletedBuild/testReport/com.couchbase.transactions/SerializersTest/Setup_test_Infra_and_Execute_Tests__7_2_stable_Vs_Java_1_2_4__committedReplacedSerialized/

Not sure what's going on yet.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Activity

Show:

Graham Pople August 12, 2022 at 9:43 AM

From static inspection it's definitely coming from cleanup.

noSerializationContentInClientRecords has created a new cluster connection, but AFAICT there's no way for it to have disconnected it while transactions is running.  It has just called transactions.close().

Can replicate almost immediately with this simple code:

Hooks.onErrorDropped(err -> {     throw new RuntimeException(err); }); while (true) {     Transactions transactions = Transactions.create(cluster, TransactionConfigBuilder.create()             .cleanupWindow(Duration.ofSeconds(1)));     Thread.sleep(ThreadLocalRandom.current().nextInt(1000));     transactions.close(); }

 

The equivalent code in the integrated client is fine, however.

Actually, it happens in non-integrated even without the transactions.close(), if I leave it running.  I think it may therefore be an issue with extremely tight cleanup windows, causing ATR fetches to overlap, and one erroring with "atr not found" causing issues with the in-flight overlapping ones.  Possibly introduced in .

This would imply that it can happen in integrated client, which shares this part of the code.. And sure enough, it can, if I set the cleanup window down to 200ms.

Fixed with MonoBridge.

Graham Pople August 2, 2022 at 3:59 PM

Full trace:

23:12:51.727 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.testDeserializedInitialContent():testDeserializedInitialContent() : testEnd ================  23:12:51.728 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.noSerializationContentInClientRecords():noSerializationContentInClientRecords() : testStart ================  23:12:51.729 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - Using version : v1_2_0 23:12:51.729 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - Performer implements protocol 2.1 with caps [EXT_DEFERRED_COMMIT, EXT_TRANSACTION_ID, EXT_TIME_OPT_UNSTAGING, EXT_SERIALIZATION, EXT_TRANSACTION_ID, EXT_DEFERRED_COMMIT, EXT_TIME_OPT_UNSTAGING, EXT_BINARY_METADATA, EXT_CUSTOM_METADATA_COLLECTION, EXT_QUERY, EXT_STORE_DURABILITY, BF_CBD_3838, BF_CBD_3787, BF_CBD_3705, BF_CBD_3794, EXT_REMOVE_COMPLETED, EXT_ALL_KV_COMBINATIONS, EXT_UNKNOWN_ATR_STATES, BF_CBD_3791, EXT_SINGLE_QUERY, EXT_THREAD_SAFE, EXT_SERIALIZATION, EXT_MOBILE_INTEROP] 23:12:51.789 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - Established connection to cluster at IP: 172.23.111.131 with user Administrator  23:12:51.789 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - For this user, we assigned clusterConnectionId:bda2428d-ac40-4bea-a22b-238accb9744a 23:12:51.798 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - Creating new Transactions factory 623bd20c-cc48-4873-85b8-5920bb180db4 using cluster connection id bda2428d-ac40-4bea-a22b-238accb9744a (user Administrator), now have 3 23:12:51.832 [cb-events] INFO  com.couchbase.core - [com.couchbase.core][CoreCreatedEvent] {"clientVersion":null,"clientGitHash":null,"coreVersion":null,"coreGitHash":null,"userAgent":"couchbase-java/0.0.0 (Linux 4.15.0-91-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b12)","maxNumRequestsInRetry":32768,"ioEnvironment":{"nativeIoEnabled":true,"eventLoopThreadCount":3,"eventLoopGroups":["EpollEventLoopGroup"]},"ioConfig":{"captureTraffic":[],"mutationTokensEnabled":true,"networkResolution":"auto","dnsSrvEnabled":true,"tcpKeepAlivesEnabled":true,"tcpKeepAliveTimeMs":60000,"configPollIntervalMs":2500,"kvCircuitBreakerConfig":"disabled","queryCircuitBreakerConfig":"disabled","viewCircuitBreakerConfig":"disabled","searchCircuitBreakerConfig":"disabled","analyticsCircuitBreakerConfig":"disabled","managerCircuitBreakerConfig":"disabled","eventingCircuitBreakerConfig":"disabled","backupCircuitBreakerConfig":"disabled","numKvConnections":1,"maxHttpConnections":12,"idleHttpConnectionTimeoutMs":4500,"configIdleRedialTimeoutMs":300000,"memcachedHashingStrategy":"StandardMemcachedHashingStrategy"},"compressionConfig":{"enabled":true,"minRatio":0.83,"minSize":32},"securityConfig":{"tlsEnabled":false,"nativeTlsEnabled":true,"hostnameVerificationEnabled":true,"trustCertificates":null,"trustManagerFactory":null,"ciphers":[]},"timeoutConfig":{"kvMs":2500,"kvDurableMs":10000,"managementMs":75000,"queryMs":75000,"viewMs":75000,"searchMs":75000,"analyticsMs":75000,"connectMs":10000,"disconnectMs":10000,"eventingMs":75000,"backupMs":75000},"loggerConfig":{"customLogger":null,"fallbackToConsole":false,"consoleLogLevel":{"name":"INFO","resourceBundleName":"sun.util.logging.resources.logging","localizedName":"INFO"},"disableSlf4j":false,"loggerName":"CouchbaseLogger","diagnosticContextEnabled":false},"orphanReporterConfig":{"emitIntervalMs":10000,"sampleSize":10,"queueLength":1024,"enabled":true},"thresholdLoggingTracerConfig":{"enabled":true,"emitIntervalMs":10000,"sampleSize":10,"queueLength":1024,"kvThresholdMs":500,"queryThresholdMs":1000,"searchThresholdMs":1000,"analyticsThresholdMs":1000,"viewThresholdMs":1000},"loggingMeterConfig":{"enabled":true,"emitIntervalMs":600000},"retryStrategy":"BestEffortRetryStrategy","requestTracer":"ThresholdLoggingTracer","meter":"LoggingMeter","numRequestCallbacks":0,"scheduler":"ParallelScheduler","schedulerThreadCount":6,"transactionsConfig":{"durabilityLevel":"MAJORITY","timeoutMs":15000,"cleanupConfig":{"runLostAttemptsCleanupThread":true,"runRegularAttemptsCleanupThread":true,"cleanupWindowMs":60000,"cleanupSet":""},"numAtrs":1024,"metadataCollection":"none","scanConsistency":"none"}} {"coreId":"0x9458e5e20000004e","seedNodes":[{"address":"172.23.111.131"}]} 23:12:51.832 [cb-events] INFO  c.c.transactions.cleanup.regular - [com.couchbase.transactions.cleanup.regular][LogEvent] Starting background cleanup thread to find transactions from this client 23:12:51.832 [cb-events] INFO  com.couchbase.transactions - [com.couchbase.transactions][TransactionsStartedEvent] Transactions successfully initialised, regular cleanup enabled=true, lost cleanup enabled=true 23:12:51.832 [cb-events] INFO  com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.131"} 23:12:51.832 [cb-events] INFO  com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.128"} 23:12:51.832 [cb-events] INFO  com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.129"} 23:12:51.832 [cb-events] INFO  com.couchbase.node - [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":"0x9458e5e20000004e","managerPort":"8091","remote":"172.23.111.130"} 23:12:51.832 [cb-events] WARN  com.couchbase.transactions - [com.couchbase.transactions][TransactionLogEvent] Only one Transactions object should be created per application, but have found 3 23:12:51.832 [cb-events] INFO  com.couchbase.transactions.cleanup - [com.couchbase.transactions.cleanup][TransactionLogEvent] Cleanup settings; regular cleanup thread enabled=false, lost cleanup thread enabled=true 23:12:51.832 [cb-events] INFO  com.couchbase.transactions - [com.couchbase.transactions][TransactionsStarted] Transactions successfully started, regular cleanup enabled=false, lost cleanup enabled=true 23:12:52.039 [cb-events] DEBUG c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa found 4 buckets 23:12:52.039 [cb-events] INFO  c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket ThreadSafetyATRTestbucket 23:12:52.039 [cb-events] INFO  c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket default 23:12:52.039 [cb-events] INFO  c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket ThreadSafetyMultipleCollectionsTestbucket 23:12:52.039 [cb-events] INFO  c.c.transactions.cleanup.lost - [com.couchbase.transactions.cleanup.lost][TransactionLogEvent] Client 226fa will start cleaning lost transactions on bucket secondTestBucket 23:12:52.039 [cb-events] INFO  com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][50ms] Opened bucket "ThreadSafetyATRTestbucket" {"coreId":"0x9458e5e20000004e"} 23:12:52.039 [cb-events] INFO  com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][80ms] Opened bucket "default" {"coreId":"0x9458e5e20000004e"} 23:12:52.039 [cb-events] INFO  com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][80ms] Opened bucket "secondTestBucket" {"coreId":"0x9458e5e20000004e"} 23:12:52.039 [cb-events] INFO  com.couchbase.core - [com.couchbase.core][BucketOpenedEvent][83ms] Opened bucket "ThreadSafetyMultipleCollectionsTestbucket" {"coreId":"0x9458e5e20000004e"} 23:12:52.240 [cb-events] DEBUG c.c.transactions.clientrecord - [com.couchbase.transactions.clientrecord][TransactionLogEvent] secondTestBucket/_default/226fa0b8-2672-4917-96e3-5d4a54d42ef6 found 1 existing clients including this (1 active, 0 expired), included this=false, index of this=0, override={enabled=false,expires=0,now=1659395572000000000,active=false} 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletionException: com.couchbase.client.core.error.DocumentNotFoundException: Document with the given id not found {"clientContext":{"txn.op":"ATR::getAtrWithUpToDateCas"},"completed":true,"coreId":"0x9458e5e20000004e","idempotent":true,"lastChannelId":"9458E5E20000004E/00000000A93230C7","lastDispatchedFrom":"172.18.0.4:49800","lastDispatchedTo":"172.23.111.128:11210","requestId":133233,"requestType":"SubdocGetRequest","retried":0,"service":{"bucket":"secondTestBucket","collection":"_default","documentId":"<ud>_txn:atr-13-#129e</ud>","errorCode":{"description":"Not Found","name":"KEY_ENOENT"},"opaque":"0x1fe72","scope":"_default","type":"kv","vbucket":13},"status":"NOT_FOUND","timeoutMs":2500,"timings":{"dispatchMicros":3858,"totalDispatchMicros":3858,"totalServerMicros":0,"serverMicros":0}} 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:604) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.msg.BaseRequest.succeed(BaseRequest.java:161) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decodeAndComplete(KeyValueMessageHandler.java:387) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decode(KeyValueMessageHandler.java:366) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.channelRead(KeyValueMessageHandler.java:282) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:85) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:314) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:435) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 23:12:52.267 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:487) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:385) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 23:12:52.268 [cb-io-kv-313-2] WARN  com.couchbase.PerformerService - java.lang.Thread.run(Thread.java:748) 23:12:52.287 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.noSerializationContentInClientRecords():noSerializationContentInClientRecords() : testEnd ================  23:12:52.289 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - ================ void com.couchbase.transactions.SerializersTest.committedReplacedSerialized():committedReplacedSerialized() : testStart ================  23:12:52.291 [grpc-default-executor-5] INFO  com.couchbase.PerformerService - Creating new transaction with cluster connection 816f048b-b0dd-40da-89ab-94b5269d0ea7 (user Administrator) and transactions factory 3d4607d5-a098-4523-94c7-0407237b65c6

I think these happen when we fire off a request to get a document, but before the response is returned, we've stopped listening.  My initial guess is this is a shutdown issue.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created August 2, 2022 at 1:34 PM
Updated August 12, 2022 at 4:37 PM
Resolved August 12, 2022 at 4:37 PM
Instabug