Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-1406

DCPAgent fails to reconnect to stream after node failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Blocker
    • None
    • core-10.2.2
    • core-library
    • None
    • 0

    Description

      QE has reported a Sync Gateway test failure related to node failover that appears to be due to the DCPAgent failing to successfully reopen streams after node failover. The sequence of events in the test are:

      1. Start a 3 node Couchbase Server Cluster
      2. Start Sync Gateway, which successfully starts a DCP feed for all vbuckets
      3. Stop one of the server nodes (service couchbase-server stop)
      4. SG's StreamObserver implementation receives End with err=EOF for some vbuckets
      5. SG attempts to openStream for those vbuckets.
      6. The openStream requests timeout (repeatedly for 4+ minutes until the test gives up)
      7. We don't receive any mutations over DCP after the node failover, even for vbuckets that did not report an EOF.

      There are many CCCPPOLL errors in the logs even after failover is successful, like:
      gocb: CCCPPOLL: Failed to retrieve CCCP config. unambiguous timeout
      gocb: CCCPPOLL: Failed to retrieve config from any node.

      I don't know if these are related to the issue.

      I've asked QE to provide trace logs for review - those should be available shortly.

      Attachments

        1. gocbc1406.zip
          4 kB
        2. logs_for_590.zip
          1.08 MB
        3. logs_without_rebalance.zip
          689 kB
        4. sg_trace.log.zip
          1.03 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              adamf Adam Fraser
              adamf Adam Fraser
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty