Uploaded image for project: 'Couchbase Documentation'
  1. Couchbase Documentation
  2. DOC-553 Create Azure Cloud documentation
  3. DOC-554

Document work-around for Azure idle connection teardown

    XMLWordPrintable

Details

    Description

      From a customer:

      Yesterday I had a breakthrough with the connectivity issues I was experiencing when using the Couchbase .NET client 2.0 from within an Azure Worker role, connected to a Couchbase cluster running on Azure VM’s.
      I’m pretty sure I have solved the issues now, and I thought this information could be very helpful for you helping other people who experience connection problems when running Couchbase on Azure VM’s.

      This is a reconstruction of the connection problem I was having:

      • The Couchbase client (in our case wrapped in a PaaS Worker Role) requests the cluster-map from the Couchbase cluster on the first connect, and maintains an open connection with the server for streaming updates. This is how the client architecture of Couchbase works and why it’s fast and scalable. Note that our Couchbase server is running on a Azure VM (IaaS) – this is an essential part of the connection problems we encountered.
      • The Worker Role uses the Public IP (VIP) of the Azure VM to connect to Couchbase, so all calls from the Worker Role to the Azure VM have to go through Azure’s datacenter load balancer.
      • The load balancer of the Azure Datacenter will silently tear down any idle connection after 4 minutes!
        Read more on http://blogs.msdn.com/b/cie/archive/2014/02/14/connection-timeout-for-windows-azure-cloud-service-roles-web-worker.aspx
      • The Couchbase .NET client 2.0 uses TCP keep-alive to keep an open connection to the Couchbase cluster, but the default KeepAliveTime on a Worker Role is 120 minutes. So by the time theCouchebase client inside the Worker Role is trying to send a keep-alive message to the Couchbase cluster, the connection was already silently closed by the load balancer of Azure’s Datacenter.
      • When after the timeout of 4 minutes I tried to do upserts to Couchbase from within my Worker Role, the first couple of calls returned responses with status “ClientFailure”. After a while, the upserts went smooth again (my guess: after the internal connection was reestablished).

      The fix for this problem is to set the TcpKeepAlive within my Worker Role to less than 4 minutes (= 240 seconds). This can be done by adding this line inside the OnStart() method of the Worker Role:
      ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); //setting keep-alive time to 200 seconds

      An even better solution would be to have control over the KeepAliveTime from within the Couchbase client SDK.
      If I browse through the latest Github sources, I can see Jeffry has built this into the .NET SDK a couple of weeks ago. Great!

      In the Couchbase documentation, these connection problems from Azure PaaS to Azure IaaS/VM are not documented.
      I think this is valuable information for you?

      Thanks,

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              jmorris Jeff Morris
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty