Details
Description
1. Setup a 1:1 node cluster unidirectional replication.
2. Load 3M items on source cluster.
Expect 3M items replicated to destination cluster.
Output
-Replication starts with rate of 1-2k items/sec.
- During destination node compaction, replication rate drops to ~zero.
-Replication rate increases to 10-12 items /sec after compaction is complete. - Replication stops after replicating 450k items.
Error
Seeing a number of crash reports on the source node, with "Timeouts" as the reason for termination
-
- Reason for termination ==
- {http_request_failed,"POST",
"http://Administrator:*****@10.3.121.38:8092/saslbucket%2f100%3b4a1d35af1568f2fffa11491db48ae160/_bulk_docs",
{error, {error,timeout}}}
[xdcr:debug,2012-08-28T17:21:49.912,ns_1@127.0.0.1:<0.12936.1>:concurrency_throttle:signal_waiting:193]schedule a waiting rep (pid: <0.14309.1>, target node: "10.3.121.38:8092") to be active
[xdcr:debug,2012-08-28T17:21:49.912,ns_1@127.0.0.1:<0.14309.1>:xdc_vbucket_rep:handle_info:111]get start-replication token for vb 497 from throttle (pid: <0.12936.1>)
[xdcr:debug,2012-08-28T17:21:49.913,ns_1@127.0.0.1:<0.14309.1>:xdc_rep_utils:make_options:149]Options for replication from couch_config:[worker processes: 4, worker batch size: 500, HTTP connections: 20, connection timeout: 30000]
[xdcr:debug,2012-08-28T17:21:49.915,ns_1@127.0.0.1:<0.12936.1>:concurrency_throttle:signal_waiting:193]schedule a waiting rep (pid: <0.13503.1>, target node: "10.3.121.38:8092") to be active
[xdcr:debug,2012-08-28T17:21:49.915,ns_1@127.0.0.1:<0.13503.1>:xdc_vbucket_rep:handle_info:111]get start-replication token for vb 401 from throttle (pid: <0.12936.1>)
[error_logger:error,2012-08-28T17:21:49.915,ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: xdc_vbucket_rep:init/1
pid: <0.14490.1>
registered_name: []
exception exit: {http_request_failed,"POST",
"http://Administrator:*****@10.3.121.38:8092/saslbucket%2f100%3b4a1d35af1568f2fffa11491db48ae160/_bulk_docs",
{error,{error,timeout}}}
in function gen_server:terminate/6
ancestors: [<0.12937.1>,<0.12933.1>,xdc_replication_sup,ns_server_sup,
ns_server_cluster_sup,<0.60.0>]
messages: []
links: [<0.12937.1>]
dictionary: []
trap_exit: true
status: running
heap_size: 17711
stack_size: 24
reductions: 1397988
neighbours:
[xdcr:debug,2012-08-28T17:21:49.916,ns_1@127.0.0.1:<0.13503.1>:xdc_rep_utils:make_options:149]Options for replication from couch_config:[worker processes: 4, worker batch size: 500, HTTP connections: 20, connection timeout: 30000]
[error_logger:error,2012-08-28T17:21:49.916,ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
cbstats from destination node, show following meta information. [Large num of gets, and some sets]
------------------------------------------
ketaki@ubu-1516:/opt/couchbase/var/lib/couchbase/logs$ /opt/couchbase/bin/cbstats 10.3.121.38:11210 all -b saslbucket -p password | grep meta
ep_num_ops_del_meta: 2753
ep_num_ops_get_meta: 30171871
ep_num_ops_set_meta: 468378
vb_active_meta_data_memory: 38640490
vb_pending_meta_data_memory: 0
vb_replica_meta_data_memory: 0
------------------------------------------------------
cbstats info from source cluster
------------------------------------------------------
ketaki@ubu-1516:/opt/couchbase/var/lib/couchbase/logs$ /opt/couchbase/bin/cbstats 10.3.121.31:11210 all -b saslbucket -p password | grep meta
ep_num_ops_del_meta: 0
ep_num_ops_get_meta: 0
ep_num_ops_set_meta: 0
vb_active_meta_data_memory: 268461195
vb_pending_meta_data_memory: 0
vb_replica_meta_data_memory: 0
Attaching screenshot from destination cluster.