Description
I've noticed XDCR appears to collect the `"dcp"` stats group from KV. It appears the output is retained in memory.
This stat group contains detailed stats for every DCP stream (multiple streams per producer) in KV, and can generate a lot of data and stall the connection. We've previously seen timeouts of 60s during cbcollects, and 100s MiB of output (see linked MBs).
The stat group supports a filter, passed in as the value of the request, used to ignore streams not belonging to a specific user, something like `{"user": "@goxdcr"}`. Ns_server has started doing that recently: https://review.couchbase.org/c/ns_server/+/200841/7/src/dcp_replicator.erl#298
Additionally, as of 7.6.0, XDCR can specify "stream_format" to ignore the per-vBucket stream stats, if not required, as they are the bulk of the output.
https://github.com/couchbase/kv_engine/blob/master/docs/BinaryProtocol.md#statdcp
My worry here is that if XDCR asks for that STAT group and there is a very large output to generate, XDCR will observe a very large payload on that connection, and potentially delay other operations in a non-obvious way (theory).
Attachments
Issue Links
- relates to
-
MB-58868 cbstats timeouts at cbcollect
- Reopened