Details
Description
Problem Definition
The #cp-mcr team are currently investigating snapshot backup/restore for Capella, in brief, that consists creating atomic snapshots of - on disk - node state (e.g. /opt/couchbase/var/lib/couchbase).
For the golden path restore, we have a workable/tested solution that requires no intervention from couchbase-server, however, we're currently working through the non-golden path which might.
There's a strong desire to handle the following two cases:
- Soft Topology Change :: The cluster has changed topology, it must be fixed up to match the topology in the backup (i.e. the case where a customer has scaled up their cluster since their backup).
- Cluster Clone :: Entirely new infrastructure is provisioned to match the topology in the backup.
The main areas of concern, are currently:
- SRV Record :: The front door for the user
- Hostnames :: The cluster node/hostnames
- TLS Certificates :: Created by Capella based on the clusters "base" hostname (source)
- Cluster/Node UUIDs :: A discussion I intend to address outside of this ticket
For the soft topology change case, Capella can save the hostnames and re-provision the infrastructure to match.
This is not the case for the cluster clone scenario as both clusters will need to coexist.
Potential Solutions
I can see two clear solutions to these problems, although I'm sure there's more:
- Capella Remapping :: We use the /etc/hosts file to remap hostnames, and then produce a certificate which has multiple SAN entries.
- ns_server Remapping :: We provide ns_server with a map from hostname to hostname and the config is rewritten.
For the first solution, I believe we have two problems:
- Certificates :: We'll have a cluster with a certificate which is signed for hostnames from another cluster.
- Hostnames :: Intra-node communication will continue with old hostnames where external connections would use new hostnames which is likely to become a debugging nightmare (i.e. due to logs).
For the latter solution:
- chronicle :: Node names are currently stored - in part - in chronicle which rules out a trivial config rewrite.
- Services :: Unknown unknowns on how/whether other services store node hostnames (i.e. does this extend further than ns_server?).
Request
Please could we investigate having a supported way to remap/rewrite hostnames in ns_server so that we can boot from snapshotted volumes running on new infrastructure.
We could provide a payload such as the following:
{
|
"host1": "host3",
|
"host2": "host4"
|
}
|
This could be used to cause:
- host1 to remap itself
- host1 to remap the other nodes in the cluster
There may be other information required that we could gather/provide.
Considerations
There's a desire to have this functionality work for the supported versions deployed in Capella (i.e. 7.1.x and 7.2.x).
Attachments
Issue Links
- links to
- relates to
-
AV-60816 Loading...
Gerrit Reviews
For Gerrit Dashboard: MB-58691 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
208046,40 | MB-58691: Config/Node remap script | trinity | ns_server | Status: NEW | 0 | +1 |