Description
Junyi > For some reasons, the CouchDB updater crashed during XDCR, and caused cascading results that babysitting proc restart the CouchDB multiple times, and then cause XDCR replicator crashed due to the inconsistent instance startup time (Source database out of sync).
Test topology:
source:bucket0 <- bidirection -> dest1:bucket0
source:bucket0 -> dest2:bucket0
dest3:bucket0
dest4:bucket0
we have 4 outbound streams from source and 1 inbound.
there is a 20k frontend load on bucket0 with
get:70%,delete:10%,update:10%,set:10%,expire:5%
inbound load from destination is from a 4k load with
get:90%,delete:2%,update:5%,set:5%,expire:5%
data is loaded till about 70% dgm.
No views, no rebalancing.
In Couchdb Logs we see:
[couchdb:error,2013-05-09T7:44:46.318,ns_1@172.23.105.55:couch_server<0.8331.208>:couch_log:error:42]Unexpected message, restarting couch_server: {'EXIT',<0.16961.208>,
{{read_loop_died,
{problem_reopening_file,
,
,
<0.16959.208>,
"/opt/couchbase/var/lib/couchbase/data/bucket1/345.couch.2",
10}},
}}
In xdcr there are sync errors suggesting we increase max_dbs, but seems we are already hitting a limit as couchdb is restarting:
Replication `a1c985cbafac10e773b130f01d1ba85c/bucket0/bucket0` ...failed: Source database out of sync. Try to increase max_dbs_open at the source's server.
Attaching logs here from time of crash. full logs were copied up and left here:
172.23.105.55:/0509_couchdb/ (use rsa key)