Test the performance of cbdatarecovery on magma buckets
Description
Components
Affects versions
Fix versions
Labels
Environment
Release Notes Description
Attachments
is caused by
relates
Activity
Safian Ali January 9, 2024 at 12:08 PM
Created to handle this once is fixed
Apaar Gupta January 8, 2024 at 5:20 PMEdited
This issue is a known issue with SetWithMeta in Magma first seen with XDCR in MB-48834. The issue is caused by kv_engine not querying Magma's bloomfilters during the get performed to retrieve meta during SetWithMeta resulting in kv_engine performing bg_fetches. Couchstore does not have this issue since the bloomfilters are maintained by kv_engine which avoid IO if the document does not exist.
implemented an API for kv_engine to query Magma's bloomfilter which resulted in GET_META latency dropping from 45us to 10us. This API has to be used by kv_engine to avoid the costly fetch.
I am not sure on the status of the improvement, pinging
Safian Ali January 8, 2024 at 4:13 PMEdited
Assigned this ticket to storage-engine as this appears to be a Magma issue.
I’m working on testing the performance of cbdatarecovery with Magma. I’ve found that the combination of using SetWithMeta
with Magma leads to much slower performance than using either Set
or Couchstore. See the graphs attached to this ticket, and also the table below as a summary. All the tests below were done with a 10GB random data set created using pillowfight (10M items of 1KB size). In all cases, an empty bucket was created to restore to (i.e. no conflict resolution).
Set method used | Storage engine on the cluster | Time taken | Logs | Start Timestamp |
---|---|---|---|---|
Set | Couchstore | 5m 12s | 2024-01-08T12:11:34+00:00 | |
Set | Magma | 5m 17s | 2024-01-08T12:20:18+00:00 | |
SetWithMeta | Couchstore | 5m 21s | 2024-01-02T12:13:44+00:00 | |
SetWithMeta | Magma | 23m 44s | 2024-01-02T12:24:22+00:00 |
Is this a known issue? Thanks
Safian Ali January 8, 2024 at 1:32 PM
The difference in performance between the backup tools and pillowfight seem to be caused by backup using SetWithMeta instead of Set. When the backup code is modified to always use Set (here), Magma/Couchstore performance is the same. Further evidence of this can be seen in “set_with_meta_latency_couchstore_vs_magma.png”, which has the couchstore test/latency on the left, and magma on the right (yellow is the 50th percentile, green is the 90th percentile) when using SetWithMeta. Comparing with “set_latency_couchstore_vs_magma.png”, you can see that SetWithMeta latency is much higher than the latency with Set when using Magma.
Safian Ali January 3, 2024 at 6:38 PM
The following methods show a significant slowdown (~5x) when the storage engine is Magma instead of Couchstore:
cbbackupmgr restore
cbbackupmgr generate
cbdatarecovery
However, cbc-pillowfight shows the same level of performance regardless of the storage engine used. Either something is wrong with the backup code, or there is something about gocbcore which makes it slower with Magma (pillowfight uses the C SDK). I'm skeptical of the former as AFAICT the backup code works the same way when sending docs to the cluster regardless of what storage engine is used.
More ideas:
Make a basic perf testing program akin to pillowfight that can be used to stress test a cluster. I already tried this with gocb, but the ops/sec was too slow to see any difference. Might have to try with gocbcore, like is done in backup.
Compare memory profiles with different storage engines - where is the slowdown happening?
Details
Assignee
Safian AliSafian AliReporter
Gvidas RazeviciusGvidas RazeviciusStory Points
0Priority
MajorInstabug
Open Instabug
Details
Details
Assignee
Reporter
Story Points
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

The recovery of magma buckets has been enabled by this change: https://couchbasecloud.atlassian.net/browse/MB-49475.
We should test the memory and CPU usage when recovering a magma bucket.
I have already tested this on my local machine with with a magma bucket that contains 1 shard and another one that contains 8 shards. The tests results are attached to this ticket with the script used to gather that data. NOTE: the script `mem_count_local_mac.bash` has been edited to work specifically on MacOS if on linux use `mem_count_linux.bash`
Further testing needs to be done with clusters that have around 100GB of data on them. The testing steps should be as follows:
Spin up a couchbase server on an AWS instance.
Create a magma bucket and using `cbpillowfight` generate 100GB of data on that bucket.
Run the script for collecting memory and CPU data (attached) and then run `cbdatarecovery`.
Collect the data.
We should do this for a magma bucket with 1 shard and 8 shards. We should then compare the results with the local run. We should determine whether the amount of data on magma affects the performance of `cbdatarecovery` and if it does look for ways to avoid/improve this.