Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48936

XDCR - Checkpoint writes obsolete checkpoints back after backfill tasks have been cleaned up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.0.0, 7.0.1, 7.0.2, 7.1.0
    • XDCR
    • Untriaged
    • 1
    • No

    Description

      From MB-48919, we will follow VB 69 on node 120.170 (randomly picked) to trace the path.

      The serializer does its job in serial… stop the backfill pipeline, and then clean the backfill pipeline checkpoints:

      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.PipelineMgr: PipelineOpSerializer b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 done handling job: Job for b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 Type: BackfillPipelineStop
      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.PipelineMgr: PipelineOpSerializer b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 handling job: Job for b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 Type: BackfillPipelineClean
      

      But stopping the pipeline is async as in the pipeline updater’s run() takes the signal from the channel arbitrarily and the pipeline checkpoints are cleaned first then the pipeline stopped:

      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.CheckpointSvc: DelCheckpointsDocs for replication backfill_b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0...
      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.PipelineMgr: Replication b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0's backfill Pipeline is stopping
      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.PipelineMgr: Stopping the backfill pipeline b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0
      2021-10-14T04:29:53.731-07:00 INFO GOXDCR.GenericPipeline: Stopping BackfillPipeline backfill_b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0-26838238
      2021-10-14T04:29:53.732-07:00 INFO GOXDCR.CheckpointSvc: DelCheckpointsDocs is done for backfill_b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0
      2021-10-14T04:29:53.732-07:00 INFO GOXDCR.PipelineMgr: b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 CleanupBackfillPipeline including checkpoints removal finished (err = <nil>)
      2021-10-14T04:29:53.732-07:00 INFO GOXDCR.PipelineMgr: PipelineOpSerializer b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 done handling job: Job for b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 Type: BackfillPipelineClean
      2021-10-14T04:29:53.732-07:00 INFO GOXDCR.PipelineMgr: PipelineOpSerializer b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 handling job: Job for b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 Type: BackfillPipelineStart
      2021-10-14T04:29:53.733-07:00 INFO GOXDCR.PipelineMgr: Replication status received startBackfill, current status=name={b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0}, status={Replicating}, errors={[]}, oldProgress={All incoming nozzles have been opened}, progress={Pipeline is running}, oldBackfillProgress={All incoming nozzles have been opened}, backfillProgress={Pipeline is running}
      2021-10-14T04:29:53.733-07:00 INFO GOXDCR.PipelineMgr: PipelineOpSerializer b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 done handling job: Job for b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 Type: BackfillPipelineStart
      2021-10-14T04:29:53.832-07:00 INFO GOXDCR.CheckpointMgr: Starting checkpointing for BackfillPipeline backfill_b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0 before stopping
      2021-10-14T04:29:53.832-07:00 INFO GOXDCR.CheckpointMgr: Start one time checkpointing for replication BackfillPipeline backfill_b67b789bcfd95e6b97c8af1e8fa5c7cb/GleamBookUsers0/GleamBookUsers0
      

      Synchronization is needed here to ensure:

      1. Backfill Pipeline stop happens first
      2. If clean up is to occur soon after, skip checkpointing
      3. Cleanup happens only once backfill pipeline is stopped

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-48936
          # Subject Branch Project Status CR V

          Activity

            People

              neil.huang Neil Huang
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty