Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4832

failed rebalance followed by failover leads to very large data loss

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 2.0-developer-preview-4
    • 2.0-developer-preview-4
    • couchbase-bucket
    • Security Level: Public
    • None
    • dp4 -rc715
      3-5 nodes

    Description

      After loading about 6 million docs into a 3 node cluster, I attempted to add 2 more nodes and rebalance, but it hung and I stopped it after ~20 minutes. The cluster however showed that the nodes were added so I failed over 2 of the original nodes and the dataloss was 72% (6million to 700k docs). Seems rather high, though I suspect this was due to a failure to distribute items to nodes that were first rebalanced in.

      If this behavior is expected, I wonder if it will be possible to also warn user about how much data will be lost.

      Real bug is probably with rebalance. diags attached.

      Attachments

        1. 10.1.2.39-8091-diag.txt.zip
          1.91 MB
        2. 10.1.2.40-8091-diag.txt.zip
          1.62 MB
        3. 10.1.2.42-8091-diag.txt.zip
          303 kB
        4. 10.1.2.44-8091-diag.txt.zip
          2.23 MB
        5. 10.1.2.45-8091-diag.txt.zip
          1.18 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            tommie Tommie McAfee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty