Details
Description
Setup
-------------------
-Create 2 clusters with sasl bucket
-Load 15M items on source.
- Create unidirectional replication from source to destination cluster.
- After 4M items are replicated, enabled/disable firewall to drop incoming packets on the destination cluster.
On Master node
sudo iptables -A INPUT -p tcp --dport 8092 -j DROP
sudo iptables -A INPUT -p tcp --dport 8091 -j DROP
On Non-master node
sudo iptables -A INPUT -p tcp --dport 8092 -j DROP
sudo iptables -A INPUT -p tcp --dport 8091 -j DROP
And then disable this firewall using
sudo iptables --flush on master/non-master nodes.
Observation
-------------------
Replication is not broken, but the replication rate has dropped significantly.
The first 4M items were replicated at a rate of about 3k items/sec.
-The next replications are much slowed, as low as 0 items/sec on some nodes.
-Replication is frequently 0 on the nodes, it picks up after 10-20 minutes of inactivity.
Attaching screenshots
Per xdcr replication logic, incase of intermittent networks, replication will keep trying to replicate upto X amount of time/period?
Seeing CRASH reports with timeouts mainly on the source end
error_logger:error,2012-08-22T10:22:19.434,ns_1@10.3.121.32:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: xdc_vbucket_rep:init/1
pid: <0.4570.1>
registered_name: []
exception exit: {function_clause,
[{xdc_vbucket_rep_ckpt,source_cur_seq,
[{rep_state,
{rep,<<"69d340dbe30a03c5c91d25958a000f73">>,
<<"saslbucket">>,
<<"/remoteClusters/a/buckets/saslbucket">>,
[
[ {continuous,true},
{http_connections,20},
{retries,10},
** Reason for termination ==
** {function_clause,
[{xdc_vbucket_rep_ckpt,source_cur_seq,
[{rep_state,
{rep,<<"69d340dbe30a03c5c91d25958a000f73">>,
<<"saslbucket">>,
<<"/remoteClusters/a/buckets/saslbucket">>,
[{connection_timeout,30000}
,
,
,
,
{socket_options,[
,
{nodelay,false}]},
,
]},
,
<0.2963.1>,<0.2960.1>,<<"saslbucket/443">>,
<<"http://Administrator:password@10.3.121.36:8092/saslbucket%2f443%3bcc89df5b3739177398ed813a238513fd">>,
undefined,undefined,undefined,undefined,[],
{[
,
,
,
,
,
,
{<<"history">>,
[{[{<<"session_id">>,
- Adding logs from the clusters.
We would expect that replication rate to eventually catch-up ? [After any network disruptions/ intermittent network on/off ...]
Attachments
For Gerrit Dashboard: MB-6379 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
20052,2 | MB-6379: fix replication state initialization code | master | ns_server | Status: MERGED | +2 | +1 |