Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6497

Not ready to replicate from vbuckets cause rebalance failure due to bad_replicas when replica count > 1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 2.0-beta
    • 2.0-beta
    • ns_server
    • Security Level: Public
    • None

    Description

      See reopening of MB-4673.

      We've recently changed replication supervisor's childs to have type temporary. And that means supervisor will not try to restart failed childs.

      But when some vbuckets on source are not yet ready to be replicated from (see later when this happens) we deal with that by 'crashing' replicator after 30 seconds expecting us to be restarted and to deal with new ready set of vbuckets.

      It can be seen that there's tiny race in both 1.8.1 style and new-style vbucket filter change logic where vbucket filter change command can be sent to dying ebucketmigrator. So that's another related bug.

      When this happens? This is 'typical' for replica count > 2 case even for our 'reliable' replica building attempt. Basically we do replica building in star formation and that means that when vbucket movement is done some replicas even later in chain may be slightly ahead of previous in chain replicas (but ofcourse never ahead of master). If that 'being ahead of' actually means later checkpoint id, that will cause backfill into that later replica, which will mean that this replica will be with open checkpoint 0 for some time. Condition where replication from is not possible. So that's it, that's where we cannot replicate some subset of vbuckets and have to restart ourselves later.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty