Description
What is the problem?
cbdatarecovery does not transfer vBuckets concurrently (if there are more than 4096 documents per vBucket), no matter what threads is set to.
For every source we loop over the data ranges (i.e. one vBucket at a time) and call Document on the callbacks for every document we have. In the couchbase sink this method queues the document into the channel for the worker for that vBucket. This will ultimately be blocking, although by default the channel does have a buffer of 4096.
In the archive source we loop over the data ranges in parallel which means we will hit more than one worker at the same time, giving us concurrency. In the recovery source we do it sequentially which due to the blocking nature of sending to the channel means we never transfer concurrently.
This was introduced when we made it so each worker only transferred a subset of vBuckets (MB-37023).
What is the fix?
We should use a worker pool to loop over the dataranges.
Attachments
Issue Links
- is caused by
-
MB-37023 [CBM] Tidy up/fix the data range logic
- Closed
For Gerrit Dashboard: MB-58094 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
195093,13 | MB-58094 Make cbdatarecovery more concurrent | master | backup | Status: MERGED | +2 | +1 |