When indexes rollback to a disk-snapshot, there seems to be a sudden increase in RSS. The issue can be reproduced locally using the following steps:
a. Set up a cluster with 4 nodes - 3KV, 1 index+n1ql nodes
b. Populate 30M docs in the bucket - large number of documents are required to notice visible difference in RSS change after rollback
c. Create and build some indexes
d. Block replication from KV node1 from KV node2
e. Perform some mutations (~1000)
f. Failover KV node2
Failover of KV node2 would cause a rollback to disk snapshot. Notice the increase in RSS after rolling back to disk snapshot.
The increase in RSS seems to be coming because of JEMalloc fragmentation. After rollback, the bin utilisation seems to go down because of which an increase in RSS is observed.
In MOI, when there is a rollback to disk snapshot, the existing indexed data will be cleared and the entire index data will be re-built from disk snapshot and DCP. The close of main store and loading snapshot happens concurrently. This could be the possible cause for high fragmentation.
The following things can be investigated:
a. The performance penalty of closing the mainstore synchronously i.e. first close the mainstore and then load the snapshot
b. Identify other possible ways to minimise JEMalloc fragmentation