I am trying to repro in the daily sanity. With a standalone pillow fight command I see a 5-10% regression.
... <cut> ...
After all this experimentation I think (and I need to verify this) that a 1 node cluster is more like to uncover the regression than a 2 node cluster. Based on knowledge of the fix does this make sense? Of the above are there settings I should (or should not) change to help expose this?
Your results approximately match mine - the figures I quoted in my local test (~23.5s -> 24.7s, or ~5%) were on a single Ubuntu 12.04 machine (24 logical CPU Sandybridge Xeon), running two nodes (via cluster_run). Interestingly I saw much more significant difference running the same pillowfight test on my OS X laptop (Haswell, 8 logical CPU) - where performance dropped by over 50%.
The underlying cause of the perf regression was lock contention on a per-bucket mutex, so having lots of different connections trying to concurrently access the same bucket would be expected to show the issue. I'd probably expect you'd see it sooner with smaller documents, lots of connections at the same time and mostly performing reads.