Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30873

Analytics Service exiting during rebalance with error "Unexpectedly error during replication"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 6.0.0
    • 5.5.0
    • analytics
    • None
    • Untriaged
    • Unknown

    Description

      Within context of kubernetes, I started with a 3 node cluster where each node has all the cb services.  

      1. Failover a node
      2. Delta recover node
      3. Just as rebalance starts, kill the node being recovered
      4. Add a new node to the cluster and rebalance

      Here I notice the rebalance is failing continually with the following error in the logs:

      2018-08-10T22:15:24.880+00:00 INFO CBAS.cbas our node is within topology refresh b50b01a59d27870f2a7c51754ee89f4f/3; ensuring driver is running...
       
      2018-08-10T22:16:18.050Z WARN CBAS.management.ReplicationChannel [Replication Worker] Unexpectedly error during replication.
       
      org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException
       
              at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:69) ~[asterix-replication.jar:5.5.0-2958]
       
              at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.handle(ReplicationChannel.java:142) ~[asterix-replication.jar:5.5.0-2958]
       
              at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.run(ReplicationChannel.java:113) [asterix-replication.jar:5.5.0-2958]
       
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
       
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
       
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
       
      Caused by: java.io.EOFException
       
              at org.apache.asterix.replication.management.NetworkingUtil.readBytes(NetworkingUtil.java:50) ~[asterix-replication.jar:5.5.0-2958]
       
              at org.apache.asterix.replication.messaging.ReplicationProtocol.readRequest(ReplicationProtocol.java:70) ~[asterix-replication.jar:5.5.0-2958]
       
              at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:61) ~[asterix-replication.jar:5.5.0-2958]
       
              ... 5 more
       
      2018-08-10T22:16:20.691+00:00 INFO CBAS.cbas Notified path /cbas/bootstrap/ensureCc/f38ddcce859d74f5cfd65eea7b7343ba changed to 2018-08-10T22:16:20.643Z
       
      2018-08-10T22:16:24.303Z WARN CBAS.management.ReplicationChannel [Replication Worker] Unexpectedly error during replication.
       
      org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException

       

      Issue could be some kind of race.  I have run this scenario several times and this is first time I've encountered this behavior.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.blow Michael Blow
              tommie Tommie McAfee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty