Details
-
Bug
-
Resolution: Fixed
-
Major
-
2.0
-
Security Level: Public
-
4-node cluster, 32GB RAM per node, each machine with 4 SSDs.
-
Untriaged
Description
When performing updates against the database, we're able to get throughput of about 160,000 ops/second, with throughput to disk of about 21,000 ops/second (92% cache hit rate).
However, when we try to do either synchronous writes to disk or synchronous replication, the server fails. Either we get 100% error rates, or if we throttle input and add auto-retries, the throughput drops to tens of operations per second. Given the disk is happily doing over 20k ops/sec, we'd expect to be able to process traffic at that volume even with synchronous disk writes.
We're calling the update() method in the Java client with PersistTo.ONE (or ReplicateTo.ONE). This is done through YCSB.
Is this the correct approach? Is there something else we should be doing? What numbers should we be expecting?
Attached is the console screenshot. You'll see that a some of the keys are getting requests at a few thousand per second, but virtually all those requests are failing, causing a true throughput of close to zero.