Details
Description
Build: 4.7.0-857 (although it doesn't matter).
Setup: 4 data nodes, 1 client machine for cbbackupmgr.
Steps:
1. Load data (400M documents, 1 KB avg. size). For the reference, it takes about 1 hour.
2. Create a backup (668GB total size).
3. Flush the bucket.
4. Restore data.
./opt/couchbase/bin/cbbackupmgr restore --archive /workspace/backup --repo default --threads 16 --host http://172.23.100.29:8091 --username Administrator --password password
|
Everything starts OK and throughput is reasonably high (~100MB/sec).
Things get more complicated when resident ratio goes below 100%.
DCP queue remains very high (7-8M items), resident ratio drops to zero on some nodes, there are constant temp OOM failures. There are 150M more documents to restore but the average throughput is about 2.5K.
Uneven utilization of servers during restore doesn't work very well in DGM / heavy DGM scenarios. One node gets overloaded and blocks the entire process.
Eventually we hit hard OOM situation and restore hangs.