The issue seems to be better but will still show up when the machine is very slow. In this case it is taking extremely long for kv flushers to process mutations. Similarly Magma::Shutdown is also taking very long since we do a bunch of IO during shutdown.
2022-03-18T08:32:10.824923-07:00 WARNING (sasl_bucket_1) Slow runtime for 'Running a flusher loop: flusher 3' on thread WriterPool3: 197 s
This message is similar to MB-51422. Pavithra Mahamani was suspecting a performance regression so I ran some Magma perf tests to ensure that we did not have a regression in the toy build. It seems to be fine. I think the shared VM's are getting slow during a sustained workload since we did not see this when the test was ran on machines with dedicated disks.
The fix to abort ongoing flushes seems to mitigate the issue but we can't be sure. I would recommend we either raise the timeout to 10 minutes or defer the issue. There also are cases where kv_engine will be unable to similarly shutdown in the given 5 min. This operation failing is not fatal and retrying the rebalance works.