Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: 5.5.0
Component/s: XDCR
Labels:
None

Description

There are currently these MB’s filed with regards to RemoteClusterService:

This MB is to resolve the issues above by addressing them via a newer and simpler design, knowing the common pitfalls to avoid.

Intro

Remote Cluster Service holds a locally cached copy of the remote cluster reference, and keeps any modification synchronized with metakv. We first take a look at the use cases:

Write use cases (We define write use cases as any operation that causes a change locally and then attempts to update metakv):

Remote Cluster service has the basic functionalities from UI/CLI/REST that behaves with a write operation:

1. Adding a remote cluster reference - (addRemoteCluster)

2. Deleting a remote cluster reference - (DelRemoteCluster)

3. Modifying a remote cluster reference - (setRemoteCluster)

Internally, Remote Cluster Service is potentially modified with a refresh() operation. This refresh() operation is called in the following use cases and potentially can have write operation on metakv.

Periodic refreshing from Replication Manager
Each remote cluster reference is expected to be refreshed whether or not it’s active or not.
When creating a new ReplicationSpec - calls refresh()
RemoteClusterByRefName <- ValidateNewReplicationSpec (could be run parallel) <- createAndPersistReplicationSpec <- CreateReplication
When a new pipeline is constructed (XDCRFactory’s NewPipeline) - checks live remote reference status by doing a refresh
RemoteClusterByUuid <- NewPipeline
When upgrading from Couchbase < v5.0 to 5.0 or above:
upgradeREmoteClusterRefs() before any replication is started (as replication manager gets started).

In other words, the operations, other than the periodic refresh, are not to be happening often concurrently from a node perspective, and we can serialize that.

To solve this, we propose a similar concept to the pipeline updater, called RemoteCluster Agent. Any present remote cluster reference is associated with an agent. And the agent will be in scope until the replication spec is removed.

The remote cluster service, in essence, will have a list of agents, with each agent responsible for providing the original purpose of a cache, with the ability to provide multiple readers with concurrent access, and also serializing the modification of cached data. We can simplify the original design but not having to worry about CAS, as we have RWMutex for controlling the agent.

The agent itself will also be responsible for updating the metaKV. Since each agent is responsible for a single reference in a single node, each agent will be able to serialize concurrent write operation to its cache, as well as serializing the update of metaKV. The next segment will talk about coordinating multiple agents concurrently updating a single metaKV in a distributed system environment.

RemoteCluster Agent behavior overview:

Sample proposed remoteClusterAgent struct with minimum members to replace a cache:

type RemoteClusterAgent struct {

reference metadata.RemoteClusterReference

refNodesList []string

refMtx sync.RWMutex

}

AddRemoteCluster operation:

Create an Agent
Agent will start
Start its self monitoring
Start metakvAgent (responsible for updating metakv) - serializes update at an cluster level.
Realize that this start is NOT from a metakv update.
Write the reference to metakv via metakvAgent

Non-Active node -

metakv callback
Realizes that a new RC has been created
Creates an agent from the reference
Agent Start
Start self-monitoring
Start metakv service
Realizes that this Start is from metakv update, do not re-write update.

Active SetRemoteCluster operation:

Get the agent with the reference ID.
WLock
Update the agent’s internal reference
Update metakv
WUnlock
service.metadata_change_callback(refId, oldRef, newRef) -> calls metakv_change_listener.go’s remoteClusterChangeHandlerCallback

Passive SetRemoteCluster:

metakv callback (RemoteClusterServiceCallback)
Finds agent for the cluster.
WLock
Update agent’s internal reference
WUnlock

Active DelRemoteCluster operation:

Find Agent
WLock
Delete reference from metakv
Stop Agent
WUnlock
Delete Agent

Non-active DelRemoteCluster:

metakv callback (RemoteClusterServiceCallback)
Find Agent
WLock
StopAgent
WUnlock
Delete Agent

Any operation without refresh():

Get Agent from RemoteClusterService
agent.getRef()
RLock
Clone
RUnlock
Return cloned reference

Any operation with refresh() (i.e. RemoteClusterByRefName())

Without update:

Get Agent from RemoteClusterService
agent.refresh()
WLock
Go through refresh code
No update needed
Clone
WUnlock
Return cloned reference

With update:

Get Agent
agent.refresh()
WLock
Go through refresh code
Figure out new Node List
If hostname is not in target - go through replaceRefHostName main section
Update metakv with brand new reference
WUnlock

Any further cache related reads will be purely done on an per-agent level.

09/18/17 - Per internal discussion, the scope of the problem will not include minimizing metakv update collision below, since with the current method of picking nodes in an orderly fashion, it'll cause a small spike in metakv to do work to figure out collisions (which will be none), and then resume operation.

~~Multi-nodes race conditions when updating metaKV~~

Having an agent is a viable solution for solving the race condition that may happen within a single node, but we should examine scenario where a XDCR setup has >1 node that share the same Remote Cluster Reference (via metakv) and all are doing periodic refreshes.

As of now, each node does its own refresh. In other words, each node has a “cache” (or agent, in this new scheme) that is independent from the others. Each cache/agent has a list of nodes that this cluster belongs to. Each refresh operation, should a node in the list change, will only have modifications done locally.

~~We have had an issue in the past (-CBSE-3882~~) where many nodes in a single cluster all try to update metakv with different values, leading to conflict and ping-pong (~~MB-25004~~).-

This issue shows up when the persisted hostname is unreachable, then the RemoteClusterService will go through its cached list of nodes and attempt to replace the host in the reference with one of the nodes in its cached list.

This operation is perfectly fine if it’s a one-node system. However, in a multi-node cluster, since each node has its own cached list of the nodes, and each of its list is potentially in different order, thus it’s possible that each node would pick a different node and attempt to replace the reference hostname in metakv with a different node, leading to conflict and potential ping-pong.

Another issue that this scenario shows is that we are essentially duplicating a simple operation to a single document. As the number of nodes scale in a system, the more likely we have multiple writer leading to conflict.

~~The problems that we need to solve is minimize metakv update collision. The solution would need to include:~~

~~Have one node be the leader in updating metakv with a new hostname.~~
~~Coordinate “leader selection” without needing inter-node communication.~~

~~To solve this, we propose the following solution.~~

~~First, we must decouple the current refresh operation. Right now, the current refresh operation perform the followings in one pass. We will separate them into two different components.~~

~~Go through the cache/agent’s list of nodes, and make sure that the node list is up to date, so that all other operations that depend on the cache/agent will be valid.~~
~~Update the actual, singular copy of Remote Cluster Reference with a valid hostname, should the reference’s hostname become out of date.~~

~~The proposal is that we should have the agent’s refresh() operation be done only with #1 above. That is more vital to the immediate and correct RemoteClusterServices operations.~~

~~Moreover, each node can perform #1 independently without interference from one another. As long as they can perform #1, each node can continue operation.~~

Updating the singular copy of the reference, will be done on a separate procedure. This procedure will need to be coordinated among all the nodes so that we achieve singular writer. Let’s call this procedure updatekv(), to be broken out of refresh().

~~Updatekv() procedure:~~

~~We order a list of nodes into a priority list. If a XDCR cluster has 4 nodes, then each node would be able to know itself as one of the nodes with position {0, 1, 2, 3}.~~
1. ~~This is doable without inter-node XDCR communication by having each individual node do the following:~~
2. ~~Retrieve the list of nodes currently in the cluster from any node of the cluster.~~
3. ~~Since we cannot assume that the node list will be ordered identically for all nodes in the cluster, we sort the list by:~~
  1. ~~Running “GetNodeListWithMinInfo”~~
  2. ~~Get a list of node names by “GetNodeNameListFromNodeList”~~
  3. ~~Sort the list by alphabetical order.~~
4. ~~Each node would find itself in the list, and the index would be its priority for updating the metakv.~~
~~Node with priority 0 would be responsible for updating the metakv. If Node 0 fails to update the metakv (i.e. dies), Node 1 would take over the update, and down the list it goes.~~
This operation must be done without inter-node communication, and the following algorithm should provide a solution to this. The following assumes that the hostname in the reference from metakv is found to be out of date and needs replacing.
1. Node 0’s job is to launch the updatekv() operation. It is to find a viable node, that can replace the hostname in the reference in metakv. It will perform the function of replaceRefHostName() immediately.
2. Node 1’s job is to be the first in line should node 0 fail to update metakv. It does so by waiting 1*base.RefreshRemoteClusterRefInterval seconds. If within this time frame, it hears back from metakv listener that the reference has been updated, then it will cancel the updatekv() operation. Otherwise, once the time has expired, it will assume that node 0 has failed to update metakv, and assumes the role of a leader node and performs the update operation
3. ~~Node 2’s job is to be the second in line should node 0 and node 1 both fail. It does so by waiting 2*base.RefreshRemoteClusterRefInterval seconds. Similar to node 1.~~
4. ~~Etc for lower priority nodes.~~

~~The reasons for this design is mainly to:~~

~~Keep the current system intact without introducing new inter-node messaging systems~~
~~Minimize concurrent update to metakv to prevent ping-pong problems.~~
~~Prevent Byzantine Failure in deciding who should update metakv, without implementing a new consensus messaging system to bring nodes to order.~~

~~Since we know from use cases, that updating to metakv happens rarely since it is either~~

~~A node that user executes that updates the metakv (i.e. deleting or updating (editing) a reference)~~
~~If a hostname in the reference is moved out of the current cluster.~~

~~When an user executes an action, that specified node will be responsible for updating the metakv.~~

~~If scenario 2 occurs, then by having an consensually agreed list of node priorities, then the system can avoid the ping-pong problem.~~

Moreover, when scenario 2 occurs, each node’s Remote Cluster Service Agent will do its duty to update its cached list of nodes to ensure functionality. The updating of metakv can occur in the background, and is not essential to the operation of XDCR. With this design, we ensure that the update will happen eventually without the risk of concurrent writes within the cluster, which is the end goal that we’re trying to solve.

Attachments

Issue Links

finishs with

MB-23826 XDCR should use 11210/CCCP/Optimized Connection Management for Remote Cluster Topology

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Neil Huang

Reporter:: Neil Huang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Sep/17 2:35 PM

Updated:: 14/Jan/22 4:14 PM

Resolved:: 22/Nov/17 4:13 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-25918 - RemoteClusterService redesign to address concurrent access - Added unit tests alongside to test the RCS and agents MB-25559 - Serialize Mutable operations in RemoteClusterService MB-25558 - Ensure RCS view is consistent with local metakv MB-25557 - RemoteClusterByRefId() to return updated reference MB-25556 - Clean up internal https cache map MB-25554 - Various improvements on RemoteClusterService refresh() MB-25553 - Clean up potentially array-out-of-bounds syntax MB-25423 - Remove potential concurrent update to nodes_connection_str: Gerrit Review:

RemoteClusterService Improvement Redesign

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty