Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-1429

Failure to identify local node from remote configs when used internally to couchbase

    XMLWordPrintable

Details

    • 0

    Description

      There is a situation that can occur where the SDK is running in 'internal service' mode, where it connects to the local nodes services over localhost, and it does this using non-TLS connections, even in spite of the SDK being otherwise configured to use TLS. When we are in this mode, and a configuration is received from a remote node, the SDK incorrectly identifies the remote node as being the local node (due to the presence of the thisNode attribute), leading to an inability to successfully communicate with the local or remote nodes.

      Current proposal for config handling around this special 'internal' behaviour in gocbcore:

      • SDK should have a flag that enables this 'bound to a single node' mode (I think this exists today via the ns_server:// scheme).
      • When this flag is configured, we enforce that the connection string has exactly 1 host, and that host is a loopback address (that could be 127.0.0.1 or [::1] or some IPv6 variant of that address like [::1:1:1:1]). This single loopback address becomes the 'local node loopback address' and is used below.
      • When this flag is configured, the SDK only permits the HTTP config poller to be used (CCCP config poller is disabled)
      • Upon receiving any configuration, any missing hostnames should be replaced with the address that was used to fetch that configuration. This only happens when the nodes assigned address is a loopback address. This case also cannot happen in a multi-node cluster, as ns_server does not allow a cluster to be formed using any node configured with a loopback address.
      • After this initial processing of the configuration, for the first config we receive when running under this flag (ie: the "bootstrap config"). We should identify the 'bound node' by finding the nodesExt entry marked with 'thisNode:true', and then record a unique identifier for that node by using `${hostname}:${services.mgmt}` (note that we exclusively use the NON-TLS port here, which is guaranteed to exist even on TLS-strict clusters).
      • If no 'thisNode' entry is found in the config, the config should be considered unusable, and we should wait and poll for another configuration, logging along the way.
      • For all future configurations after this initial 'bootstrap config' (including any configs we receive via KV NMVs), we locate our local node using the node-identifier from the first config, and swap the hostname with the 'local node loopback address' referenced above (we use the address initially passed rather than a hardcoded loopback value to adhere to ns_servers IPv4/IPv6 configuration).
      • If we fail to locate our own node using the node-identifier, we should silently ignore this situation, as its likely the result of receiving a configuration where our node is being rebalanced out, and we should continue working as long as we can until our client is shut down by the service itself.
      • Later, as part of the SDKs selection of whether to use TLS or Non-TLS ports from a given config, rather than solely following the TLS configuration for the overall client, we should additionally check if the node we are trying to select for is our 'locally bound node', and if so, disable TLS to that node. Due to the config having already swapped the hostname out with a loopback address, it may be necessary to annotate the node entry indicating it is the 'locally bound node' such that this step can appropriately detect it. It is incorrect to simply remove TLS from all nodes with a loopback hostname.

      Note that we use the term 'bound node' rather than 'local node' since in a cluster_run scenario all the 'nodes' are running on a single system with the same loopback address, but the intent is to only remove TLS from the one node that the internal service is associated with.

      Attachments

        Issue Links

          For Gerrit Dashboard: GOCBC-1429
          # Subject Branch Project Status CR V

          Activity

            People

              charles.dixon Charles Dixon
              brett19 Brett Lawson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty