[backport] Inefficient sequence parsing during ISGR checkpointing
Description
The ISGR checkpointer stores lists of sequence numbers that are expected and processed in order to cacluate a safe-to-checkpoint sequence number.
These sequences are stored in a raw string format, which then have to be parsed to compare in order to sort sequences. This is done each time the checkpointer runs as part of a sort on these slices. Even if the slice is already ordered, sequences need to be parsed, which is costly both in CPU and allocation.
Although it's unexpected that these lists grow much beyond ~2x the batch size of revisions in each changes message (the replicator shouldn't be progressing further if it's still got pending operations/changes batches to complete) - those lists been observed to grow far beyond this estimation, the cause of which is yet to be determined.
Proposed fix
Pre-parse the sequences when we're adding to those expectedSeqs and processedSeqs lists so that this is only done once, and not every time we have to checkpoint.
Benchmark/repro
func BenchmarkCheckpointerUpdateCheckpointLists(b *testing.B) {
tests := []struct {
expectedSeqsLen int
processedSeqsLen int
}{
{expectedSeqsLen: 1, processedSeqsLen: 1},
{expectedSeqsLen: 100, processedSeqsLen: 100},
{expectedSeqsLen: 500, processedSeqsLen: 500},
{expectedSeqsLen: 1000, processedSeqsLen: 1000},
{expectedSeqsLen: 10000, processedSeqsLen: 10000},
{expectedSeqsLen: 50000, processedSeqsLen: 50000},
{expectedSeqsLen: 100000, processedSeqsLen: 100000},
}
for _, test := range tests {
b.Run(fmt.Sprintf("expectedSeqsLen=%d,processedSeqsLen=%d", test.expectedSeqsLen, test.processedSeqsLen), func(b *testing.B) {
expectedSeqs := make([]string, 0, test.expectedSeqsLen)
for i := 0; i < test.expectedSeqsLen; i++ {
expectedSeqs = append(expectedSeqs, strconv.Itoa(i))
}
processedSeqs := make(map[string]struct{}, test.processedSeqsLen)
for i := 0; i < test.processedSeqsLen; i++ {
processedSeqs[strconv.Itoa(i)] = struct{}{}
}
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
c := &Checkpointer{expectedSeqs: expectedSeqs, processedSeqs: processedSeqs}
_ = c._updateCheckpointLists()
}
})
}
}
The ISGR checkpointer stores lists of sequence numbers that are expected and processed in order to cacluate a safe-to-checkpoint sequence number.
These sequences are stored in a raw string format, which then have to be parsed to compare in order to sort sequences. This is done each time the checkpointer runs as part of a sort on these slices. Even if the slice is already ordered, sequences need to be parsed, which is costly both in CPU and allocation.
Although it's unexpected that these lists grow much beyond ~2x the batch size of revisions in each changes message (the replicator shouldn't be progressing further if it's still got pending operations/changes batches to complete) - those lists been observed to grow far beyond this estimation, the cause of which is yet to be determined.
Proposed fix
Pre-parse the sequences when we're adding to those expectedSeqs and processedSeqs lists so that this is only done once, and not every time we have to checkpoint.
Benchmark/repro
func BenchmarkCheckpointerUpdateCheckpointLists(b *testing.B) { tests := []struct { expectedSeqsLen int processedSeqsLen int }{ {expectedSeqsLen: 1, processedSeqsLen: 1}, {expectedSeqsLen: 100, processedSeqsLen: 100}, {expectedSeqsLen: 500, processedSeqsLen: 500}, {expectedSeqsLen: 1000, processedSeqsLen: 1000}, {expectedSeqsLen: 10000, processedSeqsLen: 10000}, {expectedSeqsLen: 50000, processedSeqsLen: 50000}, {expectedSeqsLen: 100000, processedSeqsLen: 100000}, } for _, test := range tests { b.Run(fmt.Sprintf("expectedSeqsLen=%d,processedSeqsLen=%d", test.expectedSeqsLen, test.processedSeqsLen), func(b *testing.B) { expectedSeqs := make([]string, 0, test.expectedSeqsLen) for i := 0; i < test.expectedSeqsLen; i++ { expectedSeqs = append(expectedSeqs, strconv.Itoa(i)) } processedSeqs := make(map[string]struct{}, test.processedSeqsLen) for i := 0; i < test.processedSeqsLen; i++ { processedSeqs[strconv.Itoa(i)] = struct{}{} } b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { c := &Checkpointer{expectedSeqs: expectedSeqs, processedSeqs: processedSeqs} _ = c._updateCheckpointLists() } }) } }
> go test -run=- -bench='^BenchmarkCheckpointerUpdateCheckpointLists$' -v ./db goos: darwin goarch: amd64 pkg: github.com/couchbase/sync_gateway/db cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz BenchmarkCheckpointerUpdateCheckpointLists BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=1,processedSeqsLen=1 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=1,processedSeqsLen=1-12 13432764 87.93 ns/op 48 B/op 2 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=100,processedSeqsLen=100 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=100,processedSeqsLen=100-12 121392 10186 ns/op 3632 B/op 225 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=500,processedSeqsLen=500 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=500,processedSeqsLen=500-12 26188 45871 ns/op 16432 B/op 1025 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=1000,processedSeqsLen=1000 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=1000,processedSeqsLen=1000-12 13202 92000 ns/op 32433 B/op 2025 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=10000,processedSeqsLen=10000 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=10000,processedSeqsLen=10000-12 1270 932168 ns/op 320558 B/op 20032 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=50000,processedSeqsLen=50000 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=50000,processedSeqsLen=50000-12 237 4965655 ns/op 1603810 B/op 100236 allocs/op BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=100000,processedSeqsLen=100000 BenchmarkCheckpointerUpdateCheckpointLists/expectedSeqsLen=100000,processedSeqsLen=100000-12 118 9804731 ns/op 3214010 B/op 200872 allocs/op PASS ok github.com/couchbase/sync_gateway/db 11.645s