Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
7.6.0, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.2
-
Untriaged
-
0
-
No
Description
// In CBSE, we see that DelCheckpointsDocs could potentially be delayed
// DelCheckpointsDocs launches 2 go-routines, one on metadata_svc.DelAllFromCatalog
// and the other is ckpt_svc.CleanupMapping
// It is possible that the metadata_svc.DelAll is a very slow process, and that CleanupMapping finishes
// and as it finishes, it removes the counter/topic from the shaRefCounterService's topicMaps
// But, this call is still executing...
// In the meantime, p2p merge is calling and it calls "loadBrokenMappingsInternal"
//
// vvv
// alreadyExists := ckpt_svc.InitTopicShaCounterWithInternalId(replicationId, "")
//
// mappingsDoc, err := ckpt_svc.GetMappingsDoc(replicationId, !alreadyExists /initIfNotFound/)
//
// This call will cause a new counter to be re-established
// But, the metakv still hasn't deleted it yet
// So, the GetMappingsDoc will not create a new mappings doc, and a counter will have been established
// This breaks the assumption that "When a counter is first created, it should also create a new mappingdoc"
This would lead to potential situations where a broken map was deleted but then never re-created properly, leading to "key not found" errors during pipeline start up and unable to fully start the pipeline
Attachments
Issue Links
- is a backport of
-
MB-58872 XDCR - checkpointSvc needs to protect checkpointDel operations from p2p
- Closed