Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-21145

XDCR stopped replicating items from 1 of 4 buckets (initial replication)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 4.6.0, 5.0.0
    • 4.6.0
    • XDCR

    Description

      Build: 4.6.0-3292

      Steps:
      1. Create two cluster, 5 nodes per cluster.
      2. Create 4 buckets per cluster.
      3. Load 250M documents into each bucket on one of the clusters.
      4. Set up XDC replication for all buckets (4 replication streams to the second cluster).

      "bucket-1", "bucket-2", and "bucket-4" replicated all documents successfully.

      However, ~25M items got stuck in "bucket-3".

      StatisticsManager 2016-09-29T09:13:52.207-07:00 [INFO] 29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 total_docs=50048845, docs_processed=25049457, changes_left=24999388
      

      DcpNozzle 2016-09-28T16:46:54.149-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 DCP mutation channel has been closed.Stop dcp nozzle now.
      DcpNozzle 2016-09-28T16:46:54.149-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 Ask uprfeed to close
      GenericSupervisor 2016-09-28T16:46:54.149-07:00 [ERROR] Received error report : map[dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed.]
      ReplicationManager 2016-09-28T16:46:54.149-07:00 [INFO] Supervisor PipelineSupervisor_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 of type *supervisor.GenericSupervisor reported errors map[dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed.]
      StatisticsManager 2016-09-28T16:46:54.157-07:00 [INFO] 29537ebc1f5c043dd770035df61719f4/bucket-1/bucket-1 total_docs=50048875, docs_processed=11642171, changes_left=38406704
      PipelineManager 2016-09-28T16:46:54.160-07:00 [INFO] Pipeline updater 29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 is lauched with retry_interval=10
      DcpNozzle 2016-09-28T16:46:54.160-07:00 [ERROR] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 Raise error condition DCP stream has been closed.
      DcpNozzle 2016-09-28T16:46:54.160-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 processData exits
      PipelineManager 2016-09-28T16:46:54.160-07:00 [INFO] err_list=[{"time":"2016-09-28T16:46:54.160592496-07:00","errMsg":"dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed."}]
      

      DcpNozzle 2016-09-29T09:15:16.515-07:00 [INFO] Received error when checking inactive steams for dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_0. err=Execution timed out
      

      Source nodes:

      • 172.23.96.105
      • 172.23.96.106
      • 172.23.96.107
      • 172.23.96.108
      • 172.23.96.109

      Destination nodes:

      • 172.23.96.100
      • 172.23.96.101
      • 172.23.96.102
      • 172.23.96.103
      • 172.23.96.104

      Default XDCR settings are used:

      > curl -s http://Administrator:password@172.23.96.105:8091/settings/replications | python -mjson.tool
      {
          "checkpointInterval": 1800,
          "docBatchSizeKb": 2048,
          "failureRestartInterval": 10,
          "goMaxProcs": 4,
          "logLevel": "Info",
          "optimisticReplicationThreshold": 256,
          "sourceNozzlePerNode": 2,
          "statsInterval": 1000,
          "targetNozzlePerNode": 2,
          "workerBatchSize": 500
      }
      

      Note, there were several replication restarts due to TMP OOM failures on the destination side.

      Attachments

        1. items.png
          items.png
          178 kB
        2. outbound.png
          outbound.png
          186 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavelpaulau Pavel Paulau (Inactive)
            pavelpaulau Pavel Paulau (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty