We discovered this issue on a cluster where the Elastic Search connector connected to a cluster and ns_server "control" connections to memcached started dropping. The elastic search connector has nothing to do with ns_server DCP connections, however, ns_server does monitor the DCP replications it manages and the "dcp" stats command started failing as it retrieves information for all DCP stats, not just ns_server DCP replications. Separately, memcached should not allow DCP clients to connect with connection names that are long enough to cause a subsequent stats call to fail. And lastly clients should know about the name length limitation and ensure that their connection names are less than the limit.
I would like to fix this issue in the following ways.
- Memcached should define the max length of a DCP stat name and disallow DCP connections to be established with names that have a length > 255 - MAX_DCP_STAT_NAME_LENGTH. Currently it looks like the longest stat name is stream_1024_cur_snapshot_prepare, which is 33 characters long. (This ticket now tracks this improvement)
- Memcached clients, including ns_server should modify the way their DCP connection names are constructed to ensure they are always less than the max allowed name. (Tracked by
- Lastly, it seems like a good idea to enhance the "dcp" stats call to take an optional connection name prefix so that ns_server can query stats only on the connections it has set up. (Tracked by
I'll file tickets on these issues and link here.