Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
5.5.0
-
5.5.0-1632
-
Untriaged
-
Centos 64-bit
-
No
Description
Steps to Repro:
./testrunner -i /tmp/testexec.26743.ini get-cbcollect-info=True -t eventing.eventing_rebalance.EventingRebalance.test_kv_failover_and_recovery_rebalance_with_eventing_node,nodes_init=6,services_init=kv-kv-kv-eventing-eventing-index:n1ql,dataset=default,groups=simple,reset_services=True,doc-per-day=10,skip_cleanup=True,failover_type=hard,recovery_type=full
|
Logs attached.
Tried simulating this issue with the patch against the tests testrunner has - run log[1]. Observation from one the failures:
02:02:45.192-07:00 - 02:08:02.883-07:00 : During this period, GoCB returned "no access" error
02:08:05.126-07:00 - 02:08:06.383-07:00 : During this period, GoCB returned "operation has timed out"
02:08:07.633-07:00 - 02:08:27.308-07:00 : During this period, GoCB returned "temporary failure occurred, try again later"
02:08:28.308-07:00 - 02:08:51.810-07:00 : And finally it again started throwing "operation has timed out"
Seems like GoCB gets somewhat screwed up in some cases when a KV node is recovered post failover. If I above it again, will file a JIRA against SDK.
[1]http://qa.sc.couchbase.com/job/temp_rebalance_even/226/consoleFull