Details
-
Bug
-
Resolution: Won't Fix
-
Critical
-
1.7.0
-
Security Level: Public
-
None
Description
this happens too often in production systems where the user has to failover or rebalance out one of the nodes.
replication does not seem to make any progress from->to nods for too long with these errors on all those nodes:
in this example disj write queue started from 8 million items and down to 5 million or so after 5 hours
4.4_461_gf99c147
jуhhhb?aahaa;a.hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk<Suspend eq_tapq:replication_ns_1@10.218.37.191 for 5.00 secsa
jjjЃhhhb?aahaa;a/hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk;Suspend eq_tapq:replication_ns_1@10.76.58.246 for 5.00 secsa
jjjуhhhb?aahaa;a3hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk<Suspend eq_tapq:replication_ns_1@10.218.37.191 for 5.00 secsa
jjjЃhhhb?aahaa;a4hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk;Suspend eq_tapq:replication_ns_1@10.76.58.246 for 5.00 secsa