Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50595

XDCR - investigate why "older node" message shows up

    XMLWordPrintable

Details

    • Task
    • Resolution: Duplicate
    • Major
    • Morpheus
    • 7.1.0
    • XDCR
    • None
    • 1

    Description

      In MB-50416 logs, I see messages of:

      2022-01-20T20:33:20.819-08:00 INFO GOXDCR.CheckpointMgr: BackfillPipeline backfill_3ca0304e14a75631ffefbd4423947945/GleamBookUsers0/GleamBookUsers0 remote bucket is an older node, no checkpointing should be done.
      

       

       Upon investigation, this is due to the way backfill mechanism and checkpoint manager works in the following way:
      A backfill task is composed per VB of <source VBUUID, starting seqno, ending seqno>
      When starting the backfill pipeline, the checkpoint manager fetches backfill checkpoints and goes through the motion of validating the checkpoint if it is runnable.
      This is the method: https://github.com/couchbase/goxdcr/blob/bda5902305f42ef52aba7516d17a7564d9132aef/pipeline_svc/checkpoint_manager.go#L1305

      Note that to ensure a resumable checkpoint entails that the tuple <VBUUID, seqno> must be accepted by both source and target.

      Usually, in the cases of Main pipelines, when there are no checkpoints (or no agreeable checkpoints) , the checkpoint manager will start at seqno 0, which is clean and guarantees no data loss.
      In the case of backfill pipelines, when there is no backfill checkpoint (or if there is no agreeable checkpoint), the checkpoint manager will see if a backfill task has a task assigned and set the starting to be such.
      https://github.com/couchbase/goxdcr/blob/bda5902305f42ef52aba7516d17a7564d9132aef/pipeline_svc/checkpoint_manager.go#L1504

      If there are backfill task checkpoints in place (such as pulled or pushed from peer node), it is possible that https://github.com/couchbase/goxdcr/blob/bda5902305f42ef52aba7516d17a7564d9132aef/pipeline_svc/checkpoint_manager.go#L1326-L1333 never executes, and thus leading to the incorrect message.

      Theoretically, this could happen in 7.0 too, but with p2p in 7.1, this may become more prevalent.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              neil.huang Neil Huang
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty