Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44838

Clusters with large number of buckets and nodes upgraded to cheshire-cat may see performance regressions in crucial REST APIs (cleanup old config)

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • Cheshire-Cat
    • 7.0.0
    • ns_server
    • None
    • Untriaged
    • 1
    • Unknown

    Description

      Currently we don't remove values upgraded to chronicle from ns_config (due to it being non-trivial to orchestrate in a race-free manner). Separately, many heavy REST APIs call ns_config:get() for each request. With move to chronicle, such APIs will also call chronicle_kv:get_full_snapshot() (this bit is something that could and should eventually be changed, but that's how things are at the moment). Ultimately, users that upgraded their clusters will pay double the price compared to users running fresh clusters. Similarly, in places where metadata needs to be synchronized across cluster, we'll continue paying the price of synchronizing hefty ns_config in addition to synchronizing chronicle.

      A couple of things we might do:

      1. Clean up ns_config on upgrade.
      2. Minimize the use of get_full_snapshot() to absolute minimum.

      There's probably more that we can do, but that's what comes to mind at the moment.

      We should also definitely run some tests on extreme configurations. And it's critical to remember that fresh clusters may not expose some of the behaviors that upgraded clusters will.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Build couchbase-server-7.0.0-4899 contains ns_server commit c458454 with commit message:
          MB-44838 do not use chronicle_kv:get_full_snapshot() when fetching

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4899 contains ns_server commit c458454 with commit message: MB-44838 do not use chronicle_kv:get_full_snapshot() when fetching

          Build couchbase-server-7.0.0-5050 contains ns_server commit 21541c8 with commit message:
          MB-44838 delete keys moved to chronicle from ns_config one minute

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-5050 contains ns_server commit 21541c8 with commit message: MB-44838 delete keys moved to chronicle from ns_config one minute

          Build couchbase-server-7.0.0-5050 contains ns_server commit 9a8fd7d with commit message:
          MB-44838 ns_config_log should tolerate buckets key being deleted

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-5050 contains ns_server commit 9a8fd7d with commit message: MB-44838 ns_config_log should tolerate buckets key being deleted
          Balakumaran.Gopal Balakumaran Gopal added a comment - - edited

          Aliaksey Artamonau - Is the system test upgrade that we do enough to validate this bug? If not, could you please provide the steps to validate this fix?

          Balakumaran.Gopal Balakumaran Gopal added a comment - - edited Aliaksey Artamonau - Is the system test upgrade that we do enough to validate this bug? If not, could you please provide the steps to validate this fix?
          dfinlay Dave Finlay added a comment -

          Yes, Bala, I think that will work as a way to verify.

          dfinlay Dave Finlay added a comment - Yes, Bala, I think that will work as a way to verify.

          This is mostly internal workings that are difficult to validate using black box testing, but you can check that ns_config key 'buckets' (and other keys moved to chronicle) disappear from the ns_config approximately one minute after the cluster upgrade.

          artem Artem Stemkovski added a comment - This is mostly internal workings that are difficult to validate using black box testing, but you can check that ns_config key 'buckets' (and other keys moved to chronicle) disappear from the ns_config approximately one minute after the cluster upgrade.

          Marking this closed post the upgrade from 6.6.2-9588 -> 7.0.0-5226 went successfully.

          Balakumaran.Gopal Balakumaran Gopal added a comment - Marking this closed post the upgrade from 6.6.2-9588 -> 7.0.0-5226 went successfully.

          People

            Balakumaran.Gopal Balakumaran Gopal
            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty