Note: Comparing builds 1069 on the existing run (https://perf.jenkins.couchbase.com/job/leto/21653/) and build 1070 on the new http://perf.jenkins.couchbase.com/job/leto/22033, as the existing run for 1070 didn't have logs collected.
cbmonitor shows a correlation between latency_query and avg_disk_update_time:
The test is doing queries with stale=false (https://docs.couchbase.com/server/current/learn/views/views-operation.html):
The index is updated before you execute the query, making sure that any documents updated and persisted to disk are included in the view. The client will wait until the index has been updated before the query has executed and, therefore, the response will be delayed until the updated index is available.
.. which would explain the correlation.
We have checked out the ViewEngine behaviour with Ankit Prabhu, and it seems that the "stale" param behaviour is inconsistent with what described in docs.
In particular, it seems that any incremental update to the index is done by an in-memory DCP stream, which isn't expected to be delayed by any degradation of disk writes in KV.
Regardless of the ongoing investigation on Views, a degradation in the avg_disk_update_time is a more general problem that might affect other latencies, eg PersistToMajority SyncWrite. That's what I'm addressing now.
UPDATE
There is no correlation with disk_update, some runs don't show it, eg http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_710-1069_access_0477&snapshot=leto_710-1070_access_963d#6799c0925e8393e50adc1e168ec87cbd.
rerunning to grab logs:
https://perf.jenkins.couchbase.com/job/leto/21594/
https://perf.jenkins.couchbase.com/job/leto/21595/