Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.1
-
Untriaged
-
1
-
Yes
Description
We saw recovery tests failed on build 7.1.1-3166. These runs failover a node, add the node back, and then start delta recovery.
2022-06-21T18:29:26 [INFO] Sleeping 1200 seconds before triggering failover
2022-06-21T18:49:26 [INFO] Failing over node: 172.23.99.206
2022-06-21T18:49:26 [INFO] Getting OTP node name from 172.23.99.206
2022-06-21T18:49:26 [INFO] Adding node back: 172.23.99.206
2022-06-21T18:49:26 [INFO] Getting OTP node name from 172.23.99.206
2022-06-21T18:49:26 [INFO] Enabling delta recovery: 172.23.99.206
After the runs start delta recovery, the runs hit the issue.
2022-06-21T19:13:07 [WARNING] {"deltaRecoveryNotPossible":1}
2022-06-21T19:13:07 [WARNING] Retrying http://172.23.99.203:8091/controller/rebalance
2022-06-21T19:13:17 [WARNING] {"deltaRecoveryNotPossible":1}
2022-06-21T19:13:17 [WARNING] Retrying http://172.23.99.203:8091/controller/rebalance
2022-06-21T19:13:27 [ERROR] Request http://172.23.99.203:8091/controller/rebalance failed after 20 attempts
Delta recovery after hard failover (min), 3 -> 4, 1 bucket x 100M x 2KB, 10K ops/sec
http://perf.jenkins.couchbase.com/job/hestia/7778/
Delta recovery after graceful failover (min), 3 -> 4, 1 bucket x 100M x 2KB, 10K ops/sec
http://perf.jenkins.couchbase.com/job/hestia/7777/