Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-12238

Infinite timeout on outgoing xmem requests might lead to xdcr getting stuck on network/NAT issues

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0.2
    • 2.5.1, 2.5.0, 3.0
    • ns_server, XDCR
    • Security Level: Public
    • Untriaged
    • Unknown

    Description

      As part of looking at CBSE-1399 there's suspicion that NAT between two EC2 regions have lost some state for our connections. And when this happens there's chance that one or both ends will not be able to distinguish this situation (TCP may only detect badness if there are retransmits, if everything is idle or if tcp window is already full, there's no traffic at all).

      I think we can and should:

      a) add tcp keepalive option

      b) make xmem-level timeout less than infinite

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              alkondratenko Aleksey Kondratenko (Inactive)
              alkondratenko Aleksey Kondratenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty