Uploaded image for project: 'Couchbase Kafka Connector'
  1. Couchbase Kafka Connector
  2. KAFKAC-211

Source: Rollback causes connector to terminate

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0.0-dp.1
    • Fix Version/s: 4.0.6
    • Labels:
      None
    • Story Points:
      1

      Description

      Somehow a rollback exception is ending up in the fatal error queue, causing the connector to terminate:

       

      org.apache.kafka.connect.errors.ConnectException: com.couchbase.client.dcp.error.RollbackException
          at com.couchbase.connect.kafka.CouchbaseSourceTask.checkErrorQueue(CouchbaseSourceTask.java:160)
          at com.couchbase.connect.kafka.CouchbaseSourceTask.poll(CouchbaseSourceTask.java:125)
          at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:270)
          at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:237)
          at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
          at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: com.couchbase.client.dcp.error.RollbackException
          at com.couchbase.client.dcp.conductor.DcpChannel$6$1.operationComplete(DcpChannel.java:559)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
          at com.couchbase.client.dcp.transport.netty.DcpMessageHandler.channelRead(DcpMessageHandler.java:338)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
          at com.couchbase.client.dcp.deps.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
          at com.couchbase.client.dcp.deps.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
          at com.couchbase.client.dcp.transport.netty.BucketConfigHandler.channelRead0(BucketConfigHandler.java:103)
          at com.couchbase.client.dcp.transport.netty.BucketConfigHandler.channelRead0(BucketConfigHandler.java:39)
          at com.couchbase.client.dcp.deps.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
          at com.couchbase.client.dcp.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
          at com.couchbase.client.dcp.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:308)
          at com.couchbase.client.dcp.deps.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:422)
          at com.couchbase.client.dcp.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
          at com.couchbase.client.dcp.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
          at com.couchbase.client.dcp.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
          at com.couchbase.client.dcp.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
          at com.couchbase.client.dcp.deps.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
          at com.couchbase.client.dcp.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
          at com.couchbase.client.dcp.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
          at com.couchbase.client.dcp.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
          at com.couchbase.client.dcp.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
          at com.couchbase.client.dcp.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
          at com.couchbase.client.dcp.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
       

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            talavis tal avissar added a comment - - edited

             

             

            What is the re solution ?

            Show
            talavis tal avissar added a comment - - edited     What is the re solution ?
            Hide
            david.nault David Nault added a comment -

            tal avissar The resolution is up top in the "Details" section.

            Resolution: Cannot Reproduce

            If you are seeing this error, please file a new issue with steps to reproduce.

            Show
            david.nault David Nault added a comment - tal avissar  The resolution is up top in the "Details" section. Resolution: Cannot Reproduce If you are seeing this error, please file a new issue with steps to reproduce.
            Show
            talavis tal avissar added a comment - this happens with 4.0.2 and CB server 6.6 with one node easily   See more details here: https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/rollback.md https://blog.couchbase.com/couchbase-dcp-rollback-qa-tests/   and also here  https://stackoverflow.com/questions/59652556/automatically-reconnect-failed-tasks-in-kafka-connect  
            Hide
            david.nault David Nault added a comment -

            Reopening. Using this same Jira issue since there's a forum link pointing here.

            Show
            david.nault David Nault added a comment - Reopening. Using this same Jira issue since there's a forum link pointing here.
            Hide
            talavis tal avissar added a comment - - edited

            try and test it with this property 

            couchbase.stream.from": "SAVED_OFFSET_OR_NOW"

            Show
            talavis tal avissar added a comment - - edited try and test it with this property  couchbase.stream.from ": "SAVED_OFFSET_OR_NOW"
            Hide
            talavis tal avissar added a comment -

            try and test it with this property 

            couchbase.stream.from": "SAVED_OFFSET_OR_NOW"

            when you are using this mode and youre starting the connector up you see this:

            [2021-03-25 14:23:05,459] WARN Received rollback for vbucket 943 to seqno 0 (com.couchbase.client.dcp.Client:321)
            [2021-03-25 14:23:05,459] INFO Starting to Stream for 1 partitions (com.couchbase.client.dcp.Client:587)
            [2021-03-25 14:23:05,459] INFO Stopping stream for 1 partitions (com.couchbase.client.dcp.Client:658)
            [2021-03-25 14:23:05,460] WARN Rollback during Partition Move for partition 993 (com.couchbase.client.dcp.conductor.Conductor:372)
            [2021-03-25 14:23:05,460] WARN Received rollback for vbucket 993 to seqno 0 (com.couchbase.client.dcp.Client:321)
            [2021-03-25 14:23:05,461] INFO Starting to Stream for 1 partitions (com.couchbase.client.dcp.Client:587)
            [2021-03-25 14:23:05,461] INFO Stopping stream for 1 partitions (com.couchbase.client.dcp.Client:658)
            [2021-03-25 14:23:05,461] WARN Rollback during Partition Move for partition 992 (com.couchbase.client.dcp.conductor.Conductor:372)
            [2021-03-25 14:23:05,461] WARN Received rollback for vbucket 992 to seqno 0 (com.couchbase.client.dcp.Client:321)

             

            and this Received rollback of offsets can cause/usually cause the RollbackException

             

            The resolution we found is to monitor all the time the tasks and if it reach status FAILED

            then we restart the task.  But this workaround is really ugly. this fix should be part of the connector itself 

            Show
            talavis tal avissar added a comment - try and test it with this property  couchbase.stream.from ": "SAVED_OFFSET_OR_NOW" when you are using this mode and youre starting the connector up you see this: [2021-03-25 14:23:05,459] WARN Received rollback for vbucket 943 to seqno 0 (com.couchbase.client.dcp.Client:321) [2021-03-25 14:23:05,459] INFO Starting to Stream for 1 partitions (com.couchbase.client.dcp.Client:587) [2021-03-25 14:23:05,459] INFO Stopping stream for 1 partitions (com.couchbase.client.dcp.Client:658) [2021-03-25 14:23:05,460] WARN Rollback during Partition Move for partition 993 (com.couchbase.client.dcp.conductor.Conductor:372) [2021-03-25 14:23:05,460] WARN Received rollback for vbucket 993 to seqno 0 (com.couchbase.client.dcp.Client:321) [2021-03-25 14:23:05,461] INFO Starting to Stream for 1 partitions (com.couchbase.client.dcp.Client:587) [2021-03-25 14:23:05,461] INFO Stopping stream for 1 partitions (com.couchbase.client.dcp.Client:658) [2021-03-25 14:23:05,461] WARN Rollback during Partition Move for partition 992 (com.couchbase.client.dcp.conductor.Conductor:372) [2021-03-25 14:23:05,461] WARN Received rollback for vbucket 992 to seqno 0 (com.couchbase.client.dcp.Client:321)   and this Received rollback of offsets can cause/usually cause the RollbackException   The resolution we found is to monitor all the time the tasks and if it reach status FAILED then we restart the task.  But this workaround is really ugly. this fix should be part of the connector itself 
            Hide
            david.nault David Nault added a comment -

            Quick update: we are now able to reproduce the problem, and are investigating the cause.

            Show
            david.nault David Nault added a comment - Quick update: we are now able to reproduce the problem, and are investigating the cause.
            Hide
            david.nault David Nault added a comment - - edited

            Looks like a bug in the DCP client JDCP-193 that is only triggered when persistence polling is disabled.

            Show
            david.nault David Nault added a comment - - edited Looks like a bug in the DCP client JDCP-193  that is only triggered when persistence polling is disabled.

              People

              Assignee:
              david.nault David Nault
              Reporter:
              david.nault David Nault
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty