Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.2.2, 7.2.3
-
Triaged
-
0
-
Unknown
Description
When a pipeline/replication is configured with bandwidth limit and the pipeline stops the Xmem nozzles do a cleanup. This clean-up is stuck because the writers (i.e. xmem nozzle writing to socket) wait for bandwidth throttler (referred to as only throttler henceforth) to release some capacity/quota.
However due to pipeline stopping, the throttler goroutine also exits. So we now have a situation where the writers are waiting on a non-existent throttler.
The stacktrace for Xmem Nozzle
goroutine profile: total 187736554 @ 0x43d376 0x44ddd3 0x44ddad 0x468d25 0x484f52 0x909e5f 0x46ce21# 0x468d24 sync.runtime_Semacquire+0x24 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/sema.go:56# 0x484f51 sync.(*WaitGroup).Wait+0x51 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/sync/waitgroup.go:136# 0x909e5e github.com/couchbase/goxdcr/parts.(*XmemNozzle).finalCleanup+0x3e /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:1128
|
following stacktrace shows waiting on the bandwidth throttler:
5067 @ 0x43d376 0x46901d 0x468ffd 0x48180c 0x9a7d11 0x91d3c4 0x914b8c 0x91b42b 0x904674 0x91aeaf 0x46ce21# 0x468ffc sync.runtime_notifyListWait+0x11c /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/sema.go:513# 0x48180b sync.(*Cond).Wait+0x8b /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/sync/cond.go:56# 0x9a7d10 github.com/couchbase/goxdcr/pipeline_svc.(*BandwidthThrottler).Wait+0x90 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/bandwidth_throttler.go:247# 0x91d3c3 github.com/couchbase/goxdcr/parts.(*XmemNozzle).writeToClient+0x863 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:3363# 0x914b8b github.com/couchbase/goxdcr/parts.(*XmemNozzle).sendSingleSetMeta+0xab /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:2321# 0x91b42a github.com/couchbase/goxdcr/parts.(*XmemNozzle).resendIfTimeout+0x46a /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:3036# 0x904673 github.com/couchbase/goxdcr/parts.(*requestBuffer).modSlot+0x53 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:294# 0x91aeae github.com/couchbase/goxdcr/parts.(*XmemNozzle).checkAndRepairBufferMonitor+0x32e /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/parts/xmem_nozzle.go:2980
|
Steps to reproduce:
- Create replication with bandwidth usage limit
- Ensure that usage limit is such that the writers block all the time. The following log line will indicate such situation:
-
2023-11-02T10:29:30.008Z WARN GOXDCR.BwThrottler: pipelineFullTopic=13e32dab9bdeaa83cb90cbeda32d74bf/B1/B1, 13e32dab9bdeaa83cb90cbeda32d74bf/B1/B1_BandwidthThrottlerSvc went over the limit. Need cool down before more mutations can be sent. bandwidth_limit=1048576, bandwidth_usage_quota=-206079
- Pause the replication.
- Check goroutine stack trace