Details
-
Task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
0
Description
I've implemented this in sync gateway, and I noticed this code also exists (in a buggy state) in cbgt and backup utilities.
In order to get high seqnos for a set of collections, you have to call DCPAgent.GetVbucketSeqNos for each serverIdx (starting at 1). Notably at the present time this is buggy in every caller which is why I recommend a change.
For Sync Gateway, we are looking for max seqnos for a set of collections, so we iterate over each server and each collection to get the max values to use for high sequence values. We'd be happy with a function that could do this for an arbitrary set of collections, doing it for one is fine too.
The way Sync Gateway uses high seqnos is to force an end to the DCP stream at the current state of Couchbase Server. This actually can be done better on server side using TO_LATEST when opening the DCP feed and KV will determine the max sequence snapshot to stream too. Unfortunately, this behavior is buggy until 7.2.0 MB-53448.
Bad code:
https://cs.github.com/couchbase/backup/blob/5bfe9f92b6b5a5958fd0975bab3af71b5dce27cb/couchbase/dcp_client.go#L318 and https://cs.github.com/couchbase/cbgt/blob/ad92ba06bb52325841e055428f398ea45afe5412/gocbcore_utils.go?q=GetVbucketSeqnos#L387
There are a few error conditions that our code doesn't handle around a rebalance, but it would be good if this ticket had defined behavior around the following situations:
- If nodes are added or destroyed while the rebalance happens, we might miss a value for a vBucket.
- If the rebalance just moves a vBucket between nodes, we could get a lower value than expected.