Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14848

XDCR: Long pause, core dumped, unrecoverable error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • 4.0.0
    • 4.0.0
    • ns_server, XDCR
    • Security Level: Public
    • Untriaged
    • Unknown

    Description

      Build 2046, ubuntu 14.
      Uni-directional from 3 node cluster to 1 node cluster.
      Source Logs:
      https://s3.amazonaws.com/cb-customers/davidH/2046/node1.zip
      https://s3.amazonaws.com/cb-customers/davidH/2046/node2.zip
      https://s3.amazonaws.com/cb-customers/davidH/2046/node3.zip

      Dest Logs:
      https://s3.amazonaws.com/cb-customers/davidH/2046/collectinfo-2015-05-07T130211-ns_1%40192.168.126.104.zip

      Steps:

      • Populate source cluster with 600k docs
      • Start Replication
      • Restart pillowfight against the source cluster
      • Delete some docs form the source cluster's UI

      Observations:

      • Replication transferred 5k docs (out of 600k) then paused for 7 minutes (see flatline in screenshot).
      • After 7 minutes, documents started transferring again but the source cluster's logs filled with errors such as those below
      • Node 2 from the source cluster failed entirely

      Example Log Entries:

      Port server goxdcr on node 'babysitter_of_ns_1@127.0.0.1' exited with status 1. Restarting. Messages: runtime.goexit()
      /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc212bcbfd8 sp=0xc212bcbfd0
      created by github.com/couchbase/gomemcached/client.(*UprFeed).StartFeed
      /home/couchbase/jenkins/workspace/sherlock-unix/godeps/src/github.com/couchbase/gomemcached/client/upr_feed.go:328 +0x90
      [goport] 2015/05/07 12:50:08 /opt/couchbase/bin/goxdcr terminated: signal: aborted (core dumped)
       
      Replication 6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup failed. err=map[xmem_6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup_192.168.126.104:11210_0:Received non-recoverable error from memcached in target cluster]
       
      Port server goxdcr on node 'babysitter_of_ns_1@127.0.0.1' exited with status 1. Restarting. Messages: XmemNozzle 2015-05-07T12:48:45.034Z [INFO] Xmem checking routine exits
      ToplogyChangeDetector 2015-05-07T12:48:45.056Z [INFO] Pipeline is no longer running, exit.
      PipelineManager 2015-05-07T12:48:49.635Z [INFO] Replication Status = map[6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup:name={6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup}, status={Pending}, errors={[{"time":"2015-05-07T12:48:42.786392409Z","errMsg":"map[xmem_6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup_192.168.126.104:11210_1:Received non-recoverable error from memcached in target cluster]"},{"time":"2015-05-07T12:48:24.493560111Z","errMsg":"map[xmem_6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup_192.168.126.104:11210_1:Received non-recoverable error from memcached in target cluster]"}]}, progress={Received error report : map[dcp_6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup_192.168.126.103:11210_1:Part is stopping or already stopped, exit xmem_6a8e0692c0df9cb129e99f8a97aaa23d/charlie/charlie_backup_192.168.126.104:11210_1:Received non-recoverable error from memcached in target cluster]}
      ]
      [goport] 2015/05/07 12:48:51 /opt/couchbase/bin/goxdcr terminated: signal: killed
      

      Attachments

        For Gerrit Dashboard: MB-14848
        # Subject Branch Project Status CR V

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            dhaikney David Haikney (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty