Details
-
Bug
-
Resolution: Not a Bug
-
Blocker
-
None
-
core-10.2.2
-
None
-
0
Description
QE has reported a Sync Gateway test failure related to node failover that appears to be due to the DCPAgent failing to successfully reopen streams after node failover. The sequence of events in the test are:
1. Start a 3 node Couchbase Server Cluster
2. Start Sync Gateway, which successfully starts a DCP feed for all vbuckets
3. Stop one of the server nodes (service couchbase-server stop)
4. SG's StreamObserver implementation receives End with err=EOF for some vbuckets
5. SG attempts to openStream for those vbuckets.
6. The openStream requests timeout (repeatedly for 4+ minutes until the test gives up)
7. We don't receive any mutations over DCP after the node failover, even for vbuckets that did not report an EOF.
There are many CCCPPOLL errors in the logs even after failover is successful, like:
gocb: CCCPPOLL: Failed to retrieve CCCP config. unambiguous timeout
gocb: CCCPPOLL: Failed to retrieve config from any node.
I don't know if these are related to the issue.
I've asked QE to provide trace logs for review - those should be available shortly.