Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50177

Consolidate NS_SERVER configuration frameworks (NS_CONFIG/ Chronicle / replicated_dets)

    XMLWordPrintable

Details

    Description

      Cluster configuration is delivered today by three main subsystem within the couchbase server -

      1. NS_CONFIG, which replicates via vector-clock / lease algorithm to resolve conflicts, convergence and eventual consistent. It holds metakv and some NS_SERVER specific keys. It allows local node writes, which allow for speed, but may have consistency challenges. metakv subsystem provides convenience APIs to receive notification on key changes, key hierarchy abstraction, which other services heavily depends on.
      2. Chronicle, which replicates via consensus algorithm (RAFT) to resolve conflicts, converge and strongly consistent. It holds NS_SERVER node topology and buckets configuration. All writes are done through the leader, and may or may not require quorum to read/write.
      3. User storage, which uses replicated_dets home grown logic, containing "revision" as means to resolve conflicts, converge and eventual consistent. It holds NS_SERVER users only.

      Ultimately, we should have one system to manage configuration. The declared goal when Chronicle was introduced is to serve in this capacity. In addition, having multiple sub-system leads to an overlap, and inconsistency challenges, especially in upgrade use case, where a single transaction spanning all configurations is needed. As it stands, Chronicle has gaps that prevents it from being this single subsystem for all configuration use cases:

      1. Eventual consistent / local-node-write, which is key for performance in some use case. The main challenge here is how to span a single transaction across strongly-consistent and the eventually-consistent keys. It may be that this is not achievable by definition and we'll need to come up with a specific strategy specifically for the upgrade use case.
      2. Hierarchy abstraction and notifications APIs, similar to semantics offered in NS_CNONFIG is missing.
      3. Scalability - all keys are held in memory and if we will collapse all service into Chronicle, there is a good chance we will hit this design limit.
      4. "Transactional" notifications - currently any/all changes to chronicle keys are being reported without context to consumers (example: ns_couchdb) as a result, changes are being process sequentially and may be invalid mid-flight. Possible solutions:
        1. Add transaction context (maybe a hint in form of "1 of 20 changes for txn XYZ"), such that client has the opportunity to apply "all or nothing" in one CAS (change and swap) operation on the client side (See MB-50003 for example).
        2. Bulk notifications, where all relevant transaction notification are delivered and processed in one setting.

      This ticket is aim to track the work require to bridge the gaps in chronicle and collapse all configuration sub-systems into one.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              dfinlay Dave Finlay
              meni.hillel Meni Hillel (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty