Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-463

Potential feedback loop when replicating large attachments

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.7.0
    • 2.6.0
    • SyncGateway
    • Security Level: Public
    • None
    • CBG Sprint 28, CBG Sprint 29
    • 5

    Description

      During test fest, QE and CBL teams hit a situation where Sync Gateway became non-responsive while attempting to replicate ~350 documents, where each document had large attachments (2-3 MB).

      The Sync Gateway logs showed a lot of 30s timeouts between Sync Gateway and Couchbase Server while trying to push attachments. The SG logs suggested that retry handling was taking place, so that after timeout SG would re-attempt the request up to 11 times.

      This test was running against a single Couchbase Server node on AWS. This suggests that the requests were timing out because of the large amount of data in gocb's single pipeline to the server. The concern is that the retry handling is exacerbating the situation by retrying the attachment on timeout - increasing the amount of data being pushed through the pipeline, and making future timeouts more likely.

      Generally speaking this would be mitigated with a larger server cluster, but we should still avoid the cascading failures due to retry handling.

      Need to review a few things to identify how best to avoid this scenario:

      • backoff settings when pushing/pulling attachments during blip replication
      • whether retry handling on timeout should be disabled for large attachments
      • whether timeout should automatically be extended for large attachments
      • whether attachments should have their own dedicated gocb connection, to avoid bringing down the rest of SG in this scenario
      • whether SG should be increasing the number of gocb pipelines per CBS node (I believe gocb added support for this, but not sure whether there's uptake required) This can be configured using the kv_pool_size option in the gocb connection string.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            adamf Adam Fraser
            adamf Adam Fraser
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty