Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20860

identify or add an interface for requesting and subscribing to cluster topology information

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • 4.0.0, 4.1.0, 4.1.1, 4.1.2, 4.5.0, 4.5.1, 4.6.0, 5.0.0, 5.1.0
    • couchbase-bucket, ns_server
    • None

    Description

      For purposes of understanding cluster topology when a client library wants to execute something on a service that is decoupled from buckets, it would be useful to have an identified place to get that configuration to be used by possibly thousands of clients across possibly hundreds of nodes.

      Current public interfaces for retrieving cluster configuration are to my knowledge:

      1. Carrier Publication operations/nmv replies over port 11210
      2. ns_server buckets/<bucketname> URI and terse equivalent at b/<bucketname>

      There is also pool level streaming at other ns_server streaming URIs. It's unclear if this is an intended public interface.

      With changes in the compartmentalization of the system, we now need a way to be aware of topology changes independent of buckets. This interface…

      • Must be capable of simultaneously handling 30,000 connections or more.
      • Must be able to handle an arrival rate of 10,000 clients per second or more, servicing all of these configuration requests in under 100ms.
      • Should be available over a service/services the client is going to use anyway. This is why cbmcd was selected in the time of Carrier Publication.
      • Must have a method of requesting the configuration in response to request.
      • Could have a method of subscribing for topology changes to be sent as they are received.

      The "changes in compartmentalization" I speak of is that there are now situations where either FTS searches or N1QL statements to be executed can be in an application entirely independent of a bucket. In order to actually get that request to the right service on the cluster, however, we need a way to know where to locate the services.

      As a use case example, in a given user's architecture there can be some application servers that are dedicated to only FTS searches. They expect to be able to use an SDK with only these searches. The current workaround for this today is to require the application to make a connection to a bucket involved in an FTS search.

      This is unreasonably expensive in a large deployment since the underlying SDK is maintaining persistent connections to all nodes. If there were a large number of nodes and a large number of clients, this would be expensive and could push us into the limits of memcached's number of ports.

      It's also confusing from a user perspective, since they just want to do a search/query and don't understand why they have to connect to a bucket.

      This enhancement request also aligns with RBAC which will separate the principal from the resource being accessed.

      Possibly Relevant Background

      Owing to the scale of clients we need to update and problems with the the second public interface above (covered in MB-8211), contemporary clients use only the Carrier Publication interface. This has virtually eliminated all of the problems we'd had previously and has even stood up well to tests of 5000+ clients connecting and requesting config nearly simultaneously. Users have tested us on this (a core switch failure test at a large deployment) and been happy with the result.

      Existing Possiblilties

      Current Pools Streaming Interface

      The pools level streaming interface works functionally, but is sort of a waste of a connection that isn't going to be used for anything else and has always had the MB-8211 problems. It's also not clear if that's a public interface for such things.

      Poll another URI

      I checked to see how jdbc-cb did it, thinking that there may be something that the query service exposes and it appears (though I didn't read deeply) that it polls /admin/clusters/default/nodes once a second when it has no configuration:
      https://github.com/jdbc-json/jdbc-cb/blob/master/src/main/java/com/couchbase/jdbc/core/ProtocolImpl.java#L970-L988
      https://github.com/jdbc-json/jdbc-cb/blob/master/src/main/java/com/couchbase/jdbc/CBDriver.java#L276-L297

      The query specification doesn't indicate that it has a way of handing out this information. Maybe the polling on 1s resolution as needed has already been identified as a solution to this?

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dfinlay Dave Finlay
            ingenthr Matt Ingenthron
            Votes:
            1 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty