Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4032

replication is suspended between some nodes after rebalance ( disk write queue very large and drains very slowly)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • 2.0-beta
    • 1.7.0
    • couchbase-bucket
    • Security Level: Public
    • None

    Description

      this happens too often in production systems where the user has to failover or rebalance out one of the nodes.

      replication does not seem to make any progress from->to nods for too long with these errors on all those nodes:

      in this example disj write queue started from 8 million items and down to 5 million or so after 5 hours

      4.4_461_gf99c147
      jуhhhb?aahaa;a.hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk<Suspend eq_tapq:replication_ns_1@10.218.37.191 for 5.00 secsa
      jjjЃhhhb?aahaa;a/hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk;Suspend eq_tapq:replication_ns_1@10.76.58.246 for 5.00 secsa
      jjjуhhhb?aahaa;a3hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk<Suspend eq_tapq:replication_ns_1@10.218.37.191 for 5.00 secsa
      jjjЃhhhb?aahaa;a4hinfo_msggdns_1@10.82.21.983hgdns_1@10.82.21.98lllk memcachedk <0.263.0>a:a jk;Suspend eq_tapq:replication_ns_1@10.76.58.246 for 5.00 secsa

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            farshid Farshid Ghods (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty