Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31568

5.5.3 CLONE MB-31352 - xdcr replication hang

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.5.2
    • 5.5.3
    • XDCR
    • Triaged
    • No

    Description

      I was trying to setup xdcr replication from an in-house cluster to EC2. Following things happened:

      1. Setup XDCR from 4 node(2 data nodes) inhouse cluster to 16 node EC2 cluster ( 8 data nodes) for 1 bucket (msm).

      2. I didn't provide hostname from the EC2 nodes initially. That caused the UI to initially took a long time to respond about what's happening. But after it came back, I tried to delete the replication. There was initially no response and then an error reported on the UI and multiple attempts to delete the replication were unsuccessful.

      3. I restarted the goxdcr process on one data node in source cluster 172.23.97.37. This cleaned up the XDCR replication from the UI.

      Restarted at:
      ns_1@172.23.97.37 6:43:52 PM Tue Sep 18, 2018

      4. I fixed the hostnames on EC2 cluster. Setup the XDCR replication on the source cluster again. This time it started replicating.

      5. After replicating 50% of the data, the progress stopped. I then killed the goxdcr process on the 2nd data node in the source cluster and that kicked off the replication of the remaining 50% of the data.

      Message in logs before restart:

      2018-09-19T04:09:59.411-07:00 INFO GOXDCR.PipelineMgr: Replication Status = map[a8da6785a5cce7dc20c1f861ba93a500/msm/msm:name={a8da6785a5cce7dc20c1f861ba93a500/msm/msm}, status={Pending}, errors={[]}, progress={Pipeline has been stopped}
      

      Restarted at:
      ns_1@172.23.97.38 12:19:26 PM Wed Sep 19, 2018

      Source Cluster Logs:
      https://s3.amazonaws.com/cb-customers/deepkaran/collectinfo-2018-09-19T192037-ns_1%40172.23.97.37.zip
      https://s3.amazonaws.com/cb-customers/deepkaran/collectinfo-2018-09-19T192037-ns_1%40172.23.97.38.zip
      https://s3.amazonaws.com/cb-customers/deepkaran/collectinfo-2018-09-19T192037-ns_1%40172.23.97.39.zip
      https://s3.amazonaws.com/cb-customers/deepkaran/collectinfo-2018-09-19T192037-ns_1%40172.23.97.40.zip

      Let me know if you need destination cluster logs as well.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Verified on 5.5.3-4021.

            pavithra.mahamani Pavithra Mahamani added a comment - Verified on 5.5.3-4021.

            Build couchbase-server-5.5.3-4008 contains goxdcr commit 05f2a3f with commit message:
            MB-31568 fix replication hanging problem

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.3-4008 contains goxdcr commit 05f2a3f with commit message: MB-31568 fix replication hanging problem

            People

              pavithra.mahamani Pavithra Mahamani
              jliang John Liang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty