During investigation of SyncWrite persistTo performance, is was observed that the op/s for a single-threaded persist_majority workload would drop to zero for a number of seconds, recovering after 10s.
- Start a two-node cluster run:
- Drive with a SyncWrite workload (SyncWrites not strictly necessary, but makes the bug very apparent as delay in flusher will cause op/s to go to zero). See attached sync_repl.py script.
- Observe op/s on UI.
Op/s should be broadly constant, with only small (sub-second) drops in op/s when compaction runs.
Op/s drops to zero:
During such "blips", Slow operation log messages were observed, each time the "slow" operation took very close to 4 or 8 seconds: