Details
-
Bug
-
Resolution: Fixed
-
Major
-
4.6.0
-
Untriaged
-
Centos 64-bit
-
-
Unknown
Description
Build: 4.6.0-3292
Steps:
1. Create two cluster, 5 nodes per cluster.
2. Create 4 buckets per cluster.
3. Load 250M documents into each bucket on one of the clusters.
4. Set up XDC replication for all buckets (4 replication streams to the second cluster).
"bucket-1", "bucket-2", and "bucket-4" replicated all documents successfully.
However, ~25M items got stuck in "bucket-3".
StatisticsManager 2016-09-29T09:13:52.207-07:00 [INFO] 29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 total_docs=50048845, docs_processed=25049457, changes_left=24999388
|
DcpNozzle 2016-09-28T16:46:54.149-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 DCP mutation channel has been closed.Stop dcp nozzle now.
|
DcpNozzle 2016-09-28T16:46:54.149-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 Ask uprfeed to close
|
GenericSupervisor 2016-09-28T16:46:54.149-07:00 [ERROR] Received error report : map[dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed.]
|
ReplicationManager 2016-09-28T16:46:54.149-07:00 [INFO] Supervisor PipelineSupervisor_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 of type *supervisor.GenericSupervisor reported errors map[dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed.]
|
StatisticsManager 2016-09-28T16:46:54.157-07:00 [INFO] 29537ebc1f5c043dd770035df61719f4/bucket-1/bucket-1 total_docs=50048875, docs_processed=11642171, changes_left=38406704
|
PipelineManager 2016-09-28T16:46:54.160-07:00 [INFO] Pipeline updater 29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3 is lauched with retry_interval=10
|
DcpNozzle 2016-09-28T16:46:54.160-07:00 [ERROR] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 Raise error condition DCP stream has been closed.
|
DcpNozzle 2016-09-28T16:46:54.160-07:00 [INFO] dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1 processData exits
|
PipelineManager 2016-09-28T16:46:54.160-07:00 [INFO] err_list=[{"time":"2016-09-28T16:46:54.160592496-07:00","errMsg":"dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_1:DCP stream has been closed."}]
|
DcpNozzle 2016-09-29T09:15:16.515-07:00 [INFO] Received error when checking inactive steams for dcp_29537ebc1f5c043dd770035df61719f4/bucket-3/bucket-3_172.23.96.105:11210_0. err=Execution timed out
|
Source nodes:
- 172.23.96.105
- 172.23.96.106
- 172.23.96.107
- 172.23.96.108
- 172.23.96.109
Destination nodes:
- 172.23.96.100
- 172.23.96.101
- 172.23.96.102
- 172.23.96.103
- 172.23.96.104
Default XDCR settings are used:
> curl -s http://Administrator:password@172.23.96.105:8091/settings/replications | python -mjson.tool
|
{
|
"checkpointInterval": 1800,
|
"docBatchSizeKb": 2048,
|
"failureRestartInterval": 10,
|
"goMaxProcs": 4,
|
"logLevel": "Info",
|
"optimisticReplicationThreshold": 256,
|
"sourceNozzlePerNode": 2,
|
"statsInterval": 1000,
|
"targetNozzlePerNode": 2,
|
"workerBatchSize": 500
|
}
|
Note, there were several replication restarts due to TMP OOM failures on the destination side.