[BP 6.6.4] - XDCR - File descriptor leak in XDCR

Description

In a recent case from the field we've seen XDCR holding 70,000 sockets that do not have a process on the other side of the connection. When lsof is run, these sockets show up as follows:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ... goxdcr.bi 24440 cbadmin 5u sock 0,7 0t0 303953540 protocol: TCP goxdcr.bi 24440 cbadmin 6u sock 0,7 0t0 211821560 protocol: TCP goxdcr.bi 24440 cbadmin 10u sock 0,7 0t0 216966092 protocol: TCP ...

In this case the user had set the file descriptor limit to 70k and so at this point XDCR is unable to create new connections. This issue was previously tracked in https://couchbasecloud.atlassian.net/browse/MB-44182#icft=MB-44182 and believed to be fixed, but it seems that the core issue hasn't been completely fixed.

Components

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

is a backport of

Activity

Pavithra Mahamani November 8, 2021 at 3:57 PM

Monitored open sockets for goxdcr during component test on 6.6.4-9928 and did not see the count exceed 40.

CB robot August 31, 2021 at 6:12 AM

Build couchbase-server-6.6.4-9910 contains goxdcr commit 82edbe2 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-48212#icft=MB-48212 - XDCR file descriptor leak when system is busy

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created August 27, 2021 at 4:48 PM
Updated December 11, 2021 at 1:28 AM
Resolved August 31, 2021 at 2:14 PM
Instabug

Flag notifications