Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31568

5.5.3 CLONE MB-31352 - xdcr replication hang



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.5.2
    • 5.5.3
    • XDCR
    • Triaged
    • No


      I was trying to setup xdcr replication from an in-house cluster to EC2. Following things happened:

      1. Setup XDCR from 4 node(2 data nodes) inhouse cluster to 16 node EC2 cluster ( 8 data nodes) for 1 bucket (msm).

      2. I didn't provide hostname from the EC2 nodes initially. That caused the UI to initially took a long time to respond about what's happening. But after it came back, I tried to delete the replication. There was initially no response and then an error reported on the UI and multiple attempts to delete the replication were unsuccessful.

      3. I restarted the goxdcr process on one data node in source cluster This cleaned up the XDCR replication from the UI.

      Restarted at:
      ns_1@ 6:43:52 PM Tue Sep 18, 2018

      4. I fixed the hostnames on EC2 cluster. Setup the XDCR replication on the source cluster again. This time it started replicating.

      5. After replicating 50% of the data, the progress stopped. I then killed the goxdcr process on the 2nd data node in the source cluster and that kicked off the replication of the remaining 50% of the data.

      Message in logs before restart:

      2018-09-19T04:09:59.411-07:00 INFO GOXDCR.PipelineMgr: Replication Status = map[a8da6785a5cce7dc20c1f861ba93a500/msm/msm:name={a8da6785a5cce7dc20c1f861ba93a500/msm/msm}, status={Pending}, errors={[]}, progress={Pipeline has been stopped}

      Restarted at:
      ns_1@ 12:19:26 PM Wed Sep 19, 2018

      Source Cluster Logs:

      Let me know if you need destination cluster logs as well.


        Issue Links

          For Gerrit Dashboard: MB-31568
          # Subject Branch Project Status CR V


            Build couchbase-server-5.5.3-4008 contains goxdcr commit 05f2a3f with commit message:
            MB-31568 fix replication hanging problem

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.3-4008 contains goxdcr commit 05f2a3f with commit message: MB-31568 fix replication hanging problem

            Verified on 5.5.3-4021.

            pavithra.mahamani Pavithra Mahamani (Inactive) added a comment - Verified on 5.5.3-4021.


              pavithra.mahamani Pavithra Mahamani (Inactive)
              jliang John Liang
              0 Vote for this issue
              7 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes