Details
Description
While doing some testing around failure scenario for the Java SDK, I came across a rebalance error in the webconsole. I was rebalancing a 4 nodes cluster to a healthy state of 4 nodes up, 1 of which only is query+index in addition to data.
After that, I retried rebalancing a few times and each time saw the following error (goxdcr due to cbauth stale):
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2015-08-31T12:28:18.301Z [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=2
|
I also tried to collect the logs from node 4 (.104), and it restarted on me each time! Note that node1 (.101, the data+query+index node) is also apparently restarting.
Finally was able to collect and upload logs (for my entire test session of today) from node3, logs are attached.