Description
We have been having issues with stale connections to our Couchbase cluster, causing all of the following errors:
• Unable to locate node
• Receive timeout
• Connection reset
• Reading zero (0) bytes
Upon investigation, we found that the .NET client library, we found that a stack is used internally.
MemcachedNode.cs:
private InterlockedStack<PooledSocket> freeItems;
If all connections are not being used, and given the different pool options of 10-20 connections, it is quite possible that some connections will never be used (if the load is low or moderate). The reason that some sockets then become stale, is because there is no keep-alive set (it's off by default).
socket.SetSocketOption (SocketOptionLevel.Socket, SocketOptionName.KeepAlive, true);
I would like to propose that the keep-alive option is available as a configuration option (this is related to NCBC-125) and that the stack is swapped for a queue.
This would likely mitigate the need for working around idle connection timeouts as recommended in the documentation: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-bestpractice-deployment.html
Of course, one needs to ensure that all network systems (switches, routers, etc.) are correctly configured too.