[BP 7.2.0] - XDCR on non-KV node freezes when replication settings are changed several times

Description

We can see a replication spec service callback stuck:

It is stuck because it’s trying to send to a reload channel: https://github.com/couchbase/goxdcr/blob/cbefdb7fec3b406d9b507aef842658b598b30032/peerToPeer/replicaReplicator.go#L456

And this causes the other replication spec callback to be stuck:

From code inspection, we can see that this means that the agent exited: https://github.com/couchbase/goxdcr/blob/cbefdb7fec3b406d9b507aef842658b598b30032/peerToPeer/replicaReplicator.go#L485-L489

And then there is nobody to listen to the reload channel: https://github.com/couchbase/goxdcr/blob/cbefdb7fec3b406d9b507aef842658b598b30032/peerToPeer/replicaReplicator.go#L509

The channel can fill up after 10 events. So someone needs to change replication settings for up to 10 times before it blocks

To reproduce

  1. Create a 2-node source cluster (KV for one, Analytics for another) to a 1-node target cluster.

  2. Create a replication.

  3. Change the replication setting 10 times. For me, I changed the XMEM nozzle batch size count one by one

  4. The 11th time changing the replication will then causes UI to freeze.

With the stack trace below showing why it froze:

To reproduce the original stack trace, create a new replication from the KV node.
And in the analytics node, we will see the following:

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

Ayush Nayyar April 10, 2023 at 8:17 AM

Replicated bug on 7.2.0-5090, and validated the fix on 7.2.0-5093.

CB robot January 21, 2023 at 12:53 AM

Build couchbase-server-7.2.0-5093 contains goxdcr commit b5f5018 with commit message:
: replicaReplicator to not get stuck on channels for non-KV nodes

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

No

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 10, 2023 at 10:03 PM
Updated May 10, 2023 at 9:41 PM
Resolved January 20, 2023 at 10:47 PM
Instabug