Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5136

Reliable replica building during rebalance sometimes fails to work resulting in larger data loss window and duplicate backfill

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 1.8.1
    • 1.8.1-release-candidate
    • couchbase-bucket, ns_server
    • Security Level: Public
    • None

    Description

      As part of fixing MB-5052 it's logs revealed that one of vbuckets we previously built using new code did not actually have closed checkpoint. So replica wasn't in fact built.

      After discussion with Chiyoung I found that using 'backfill_completed' to find out when replica is mostly up-to-date is not correct. This stat becomes true when backfill is done, but there's also next message that opens next checkpoint. And we should be waiting for it instead. But we also found there are no producer-side stats we can use.

      So we decided I'll have to additionally poll destinations for actually closed checkpoints on them before I'll stop replication building.

      Attachments

        1. 10.1.3.74-8091-diag.txt.gz
          1.10 MB
        2. 10.1.3.75-8091-diag.txt.gz
          1.17 MB
        3. 10.1.3.76-8091-diag.txt.gz
          1.16 MB
        4. 10.1.3.77-8091-diag.txt.gz
          1.08 MB
        5. 10.1.3.78-8091-diag.txt.gz
          1.08 MB
        6. 10.1.3.79-8091-diag.txt.gz
          1.16 MB
        7. 10.1.3.80-8091-diag.txt.gz
          1.27 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty