Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.4, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.2
-
Untriaged
-
0
-
Unknown
Description
When the pipeline is stopping, execution might be blocked on DelCheckpointDocs to finish:
1 @ 0x43d456 0x44ded3 0x44dead 0x468ce5 0x484f12 0x9364c5 0x9f7c94 0x9ff18f 0x483822 0x9fe307 0x9fe2d5 0x46cde1 |
# 0x468ce4 sync.runtime_Semacquire+0x24 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.1/go/src/runtime/sema.go:56 |
# 0x484f11 sync.(*WaitGroup).Wait+0x51 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.1/go/src/sync/waitgroup.go:136 |
# 0x9364c4 github.com/couchbase/goxdcr/metadata_svc.(*CheckpointsService).DelCheckpointsDocs+0x384 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/checkpoints_service.go:209 |
# 0x9f7c93 github.com/couchbase/goxdcr/pipeline_manager.(*PipelineManager).StopPipeline+0x7f3 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:648 |
# 0x9ff18e github.com/couchbase/goxdcr/pipeline_manager.(*PipelineUpdater).run.func1+0xe4e /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:1538 |
# 0x483821 sync.(*Once).doSlow+0xc1 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.1/go/src/sync/once.go:68 |
# 0x9fe306 sync.(*Once).Do+0x46 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.1/go/src/sync/once.go:59 |
# 0x9fe2d4 github.com/couchbase/goxdcr/pipeline_manager.(*PipelineUpdater).run+0x14 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:1523 |
Execution can actually be stuck in InvalidateCache() which a part of DelCheckpointDocs:
1 @ 0x43d456 0x40b5ec 0x40b018 0x93472b 0x936b33 0x46cde1 |
# 0x93472a github.com/couchbase/goxdcr/metadata_svc.(*CheckpointsServiceCacheImpl).InvalidateCache+0x8a /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/checkpointsServiceCache.go:181 |
# 0x936b32 github.com/couchbase/goxdcr/metadata_svc.(*CheckpointsService).DelCheckpointsDocs.func1+0x72 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/checkpoints_service.go:179 |
Investigating, the code of InvalidateCache:
77 func (c *CheckpointsServiceCacheImpl) InvalidateCache() { |
178 req := NewInvalidateReq() |
179 select { |
180 case c.externalInvalidateCh <- req: |
181 <-req.invalidatedAck |
182 return |
183 } |
184 } |
The channel externalInvalidateCh is a buffered channel of size 10, so request goes into the channel and we see that execution is stuck at line 181, waiting to be read from invalidatedAck which is an unbuffered channel, meaning it'll block until there is a writer.
Looking in places where there is something written to invalidatedAck,
216 func (c *CheckpointsServiceCacheImpl) Run() { |
217 for { |
218 select { |
219 case <-c.finCh: |
220 return |
...
|
317 case oneInvalidateReq := <-c.externalInvalidateCh: |
318 c.requestInvalidateCache() |
319 close(oneInvalidateReq.invalidatedAck) |
...
|
323 } |
324 } |
325 } |
Line 319 should have unblocked InvalidateCache. But there can be cases where c.finCh on line 219 can be called, it is called via CheckpointsService.ReplicationSpecChangeCallback which is one of the many metadataChangeCallbacks executed concurrently.
So in summary, CheckpointsService.ReplicationSpecChangeCallback might race ahead of
ReplicationSpecChangeListener.replicationSpecChangeHandlerCallback, which might cause InvalidateCache() be executed when Run() has already exited.
Attachments
Issue Links
- is a backport of
-
MB-59974 XDCR - CheckpointsServiceCacheImpl.InvalidateCache might block when its Run function has exited
- Closed