Description
Within context of kubernetes, I started with a 3 node cluster where each node has all the cb services.
- Failover a node
- Delta recover node
- Just as rebalance starts, kill the node being recovered
- Add a new node to the cluster and rebalance
Here I notice the rebalance is failing continually with the following error in the logs:
2018-08-10T22:15:24.880+00:00 INFO CBAS.cbas our node is within topology refresh b50b01a59d27870f2a7c51754ee89f4f/3; ensuring driver is running... |
|
2018-08-10T22:16:18.050Z WARN CBAS.management.ReplicationChannel [Replication Worker] Unexpectedly error during replication. |
|
org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException
|
|
at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:69) ~[asterix-replication.jar:5.5.0-2958] |
|
at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.handle(ReplicationChannel.java:142) ~[asterix-replication.jar:5.5.0-2958] |
|
at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.run(ReplicationChannel.java:113) [asterix-replication.jar:5.5.0-2958] |
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162] |
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162] |
|
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162] |
|
Caused by: java.io.EOFException
|
|
at org.apache.asterix.replication.management.NetworkingUtil.readBytes(NetworkingUtil.java:50) ~[asterix-replication.jar:5.5.0-2958] |
|
at org.apache.asterix.replication.messaging.ReplicationProtocol.readRequest(ReplicationProtocol.java:70) ~[asterix-replication.jar:5.5.0-2958] |
|
at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:61) ~[asterix-replication.jar:5.5.0-2958] |
|
... 5 more |
|
2018-08-10T22:16:20.691+00:00 INFO CBAS.cbas Notified path /cbas/bootstrap/ensureCc/f38ddcce859d74f5cfd65eea7b7343ba changed to 2018-08-10T22:16:20.643Z |
|
2018-08-10T22:16:24.303Z WARN CBAS.management.ReplicationChannel [Replication Worker] Unexpectedly error during replication. |
|
org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException
|
Issue could be some kind of race. I have run this scenario several times and this is first time I've encountered this behavior.
Attachments
Issue Links
- is duplicated by
-
MB-30830 K8S: Service rebalance failed when rebalancing in the analytics node in the cluster
- Closed