Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35782

Delta recovery should not create replications to all vbuckets being recovered immediately

    XMLWordPrintable

Details

    • Triaged
    • Unknown

    Description

      Currently delta recovery goes through the following steps:

      1. Queries vbucket states on delta nodes and deletes those vbuckets that diverged (only in madhatter).
      2. Creates a special transitional vbucket map with all vbuckets to be recovered listed as replicas.
      3. Waits for all delta nodes to warmup all buckets of interest (no replications are created here).
      4. In the beginning of bucket rebalance janitor cleanup is called. This creates replications to all vbuckets to be recovered.
      5. Alternatively, if the rebalance is interrupted, regular janitor run will create the replications.

      The steps 4 and 5 are problematic. If many vbuckets need to be rolled back, this may overload memcached on delta nodes (for an example of this, see CBSE-7262).

      This behavior is a bit better in madhatter with this commit: https://github.com/couchbase/ns_server/commit/35f3c77c08b39ab3744094fb00cb0dd3dfab054f. But that still doesn't address all cases.

      If possible, we should try to address this in madhatter time frame. It would seem that the way to address this is to add extra metadata to bucket configs for vbuckets being recovered and have ns_janitor:cleanup/janitor_agent:apply_new_bucket_config not create replications for those.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-35782
          # Subject Branch Project Status CR V

          Activity

            People

              dfinlay Dave Finlay
              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty