[BP 7.1.4] - XDCR - CheckpointMgr hang on P2P RespCh

Description

Checkpoint Manager with P2P () can experience potential hang due to the handler not watching out for when checkpoint manager exits.

The stack trace of interest would be:

As a result, over time, there will be a large number of Checkpoint Manager zombies lying around and unable to be cleaned up, leading to memory leak.

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

Neil Huang March 1, 2023 at 6:39 PM

The CBSE customer had busy systems. With the unit test, I had to induce delays.
How busy is the system? Have you checked the uptime to see if the load is significant?
If it is not high load, please also try making the system run under higher load?

Ayush Nayyar March 1, 2023 at 5:50 AM

I tried replicateCkptIntervalMin values of 1 and 2.

Neil Huang February 28, 2023 at 11:29 PM

What did you set replicateCkptIntervalMin to?

Ayush Nayyar February 28, 2023 at 8:27 AM

I tried creating 20 replications on two 3-node clusters, and then followed the steps you mentioned on https://couchbasecloud.atlassian.net/browse/MB-55071. I ran decent loads on the buckets while pausing and resuming the replications, but still wasn't able to replicate it on 7.1.3-3480.

CB robot January 14, 2023 at 1:08 AM

Build couchbase-server-7.1.4-3559 contains goxdcr commit 6879df0 with commit message:
: CkptMgr should not hang on P2P respCh when it is stopped

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Yes

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 12, 2023 at 12:10 AM
Updated June 1, 2023 at 6:32 PM
Resolved January 13, 2023 at 9:15 PM
Instabug