Details
-
Technical task
-
Resolution: Unresolved
-
Major
-
None
-
master
-
None
Description
ns_server has merged MB-30732 (up to 4 parallel backfills per source node + consider backfill done when persistence has been completed at destination).
This is the new baseline for any Backfill performance test.
couchbase/vulcan tests (toy-build before merging, on Hera):
http://perf.jenkins.couchbase.com/job/hera-hidd/279/ (old backfill)
http://perf.jenkins.couchbase.com/job/hera-hidd/278/ (new backfill)
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hera_550-550600_rebalance_ff45&snapshot=hera_550-550700_rebalance_2852&label=vulcan_1&label=newbackfill_4
couchbase/master tests (merged on master, on Titan):
http://perf.jenkins.couchbase.com/job/titan-reb/482/console (old backfill)
http://perf.jenkins.couchbase.com/job/titan-reb/491/console (new backfill)
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=titan_650-1537_rebalance_75e8&snapshot=titan_650-1558_rebalance_7b1c&label=titan_482_650-1537&label=titan_491_650-1557_4-backfills
Note that we have:
- relevant Rebalance speedup on Hera
- no improvement on Titan
My hypothesis is that the speedup on Hera (where the DGM ratio is higher that on Titan) is given by "ns_server waiting for persistence seqno at destination before starting the next vbucket-move" rather than "4 parallel Backfills". That is because:
- the potential of parallel Backfills cannot be unleashed because of https://issues.couchbase.com/browse/MB-31972
- "waiting for persistence seqno at destination" gives: lower DWQ -> lower mem_used -> ReplicationThrottle triggers less often at Consumer -> Backfill is paused less often at Producer
On Titan (where we have more memory available) the impact of (2) is null.