Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50240

[Magma, 2 replicas, 1%DGM]: XDCR Replication seems to be stuck on rebalance out 1 node on src cluster.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • XDCR
    • 7.1.0-1970

    Description

      1. Create a 4 node cluster
      2. Create a 2 node XDCR remote cluster
      3. Create buckets with 2 replica and collections on source cluster.
      4. Create buckets and collections on XDCR remote.
      5. Create 10000000 items sequentially
      6. Update 10000000 RandonKey keys to create 50 percent fragmentation
      7. Create 10000000 items sequentially
      8. Update 10000000 RandonKey keys to create 50 percent fragmentation
      9. Rebalance in with Loading of docs
      10. Rebalance Out with Loading of docs
      11. Check out XDCR replication on destination cluster and it is continuously running but there is no change in the items count on destination cluster while there are continuous ops visible:
      12. Src cluster also show that mutations remaining. Mutations remaining are constantly going up/down while there are no mutations happening on the src cluster.
      13. XDCR crashes are observed on the node going out in the rebalance out step above:

        Service 'goxdcr' exited with status 2. Restarting. Messages:
        runtime error: invalid memory address or nil pointer dereference
        [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa2fd5b]
         
        goroutine 1517930 [running]:
        github.com/couchbase/goxdcr/pipeline_svc.combinePeerCkptDocsWithLocalCkptDoc(0xc0acc92360, 0xc071f3b950, 0xc0acc92300, 0xc07352cea0, 0xc0955c62d0)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/checkpoint_manager.go:3057 +0x3db
        github.com/couchbase/goxdcr/pipeline_svc.(*CheckpointManager).mergeFinalCkpts.func1(0xc09eb06850, 0xc00f52f600, 0xc0b1285220, 0x49, 0xc072b62ea0, 0x2, 0x2, 0x1, 0xc0acc92360, 0xc071f3b950, ...)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/checkpoint_manager.go:2979 +0x3bb
        created by github.com/couchbase/goxdcr/pipeline_svc.(*CheckpointManager).mergeFinalCkpts
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/checkpoint_manager.go:2962 +0x1fc
        

      Source Cluster: http://172.23.107.222:8091/ui/index.html#/replications?commonBucket=GleamBookUsers0&scenarioZoom=minute&scenario=xo2xpzfzo
      Destination Cluster: http://172.23.107.102:8091/ui/index.html#/buckets?commonBucket=GleamBookUsers0&scenarioZoom=week&scenario=v5gakjwlp&openedBuckets=GleamBookUsers0

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty