Whilst doing performance work recently, I discovered that unrolling to single assoc calls were significantly faster than using multiple keys (~10% for my particular application). Zachary Tellman then pointed out that clojure.core doesn't unroll assoc at all, even for the case of relatively low numbers of keys.
We already unroll other performance critical functions that call things via
update https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L5914, but
assoc (which is, I think in the critical path for quite a bunch of applications and libraries), would likely benefit from this.
I have not yet developed patches for this, but I did some standalone benchmarking work:
| |1 |2 |3 |4 |
| :-- | :-- | :-- | :-- | :-- |
| empty array map (not unrolled) | 23ns | 93ns | 156ns | 224ns |
| empty array map (unrolled assoc) | N/A | 51ns | 80ns | 110ns |
| | | | | |
| 20 element persistent hashmap (not unrolled) | 190ns | 313ns | 551ns | 651ns |
| 20 element persistent hashmap (unrolled assoc) | N/A | 250ns | 433ns | 524ns |
| | | | | |
| record (not unrolled) | 12ns | 72ns | 105ns | 182ns |
| record (unrolled assoc) | N/A | 21ns | 28ns | 41ns |
Each measurement was made in a separate JVM, to avoid JIT path dependence.
Benchmarks were ran on a commodity server (8 cpus, 32gb ram), with ubuntu 12.04 and a recent release of Java 8. Attached are
java -version output.
Relatively standard JVM production flags were enabled, and care was taken to disable leiningen's startup time optimizations (which disable many of the JIT optimizations).
Benchmarks can be run by cloning the repository, and running
There's one outstanding question for this patch: How far should we unroll these calls?
update (which is unrolled in the 1.7 alphas) is unrolled to 3 arguments. Adding more unrolling isn't difficult, but it does impact the readability of assoc.