Welcome! Please see the About page for a little more info on how this works.

+4 votes
in Clojure by

Problem: Currently select-keys uses conj to add entries. If the map is editable, conj! could be used instead to improve select-keys performance.

Additionally keyseq is traversed as a seq but could be traversed via reduce instead, which might be faster.

Approach 1: Use a transient map and conj!, keeping loop/recur
Approach 2: Reimplement select-keys to use reduce instead of loop/recur
Approach 3: Combine approach one and two

|selected key size | loop/recur | transient | reduce | transient + reduce |
| :-- | :-- | :-- | :-- | :-- | :-- |
|1 | 243 ns | 256 ns | 161 ns | 188 ns |
|7 | 1.1 ms | - | 885 ns | 454 ns |

From these numbers, approach 3 was chosen.

Note: In order to implement select-keys in terms of reduce, select-keys needed to be moved until after the definition of reduce. This forced a (declare select-keys) since it's used before the definition of reduce

Patch: (link: https://dev.clojure.org/jira/secure/attachment/17392/0001-CLJ-1789-Use-transients-and-reduce-with-select-keys.patch text: 0001-CLJ-1789-Use-transients-and-reduce-with-select-keys.patch)

3 Answers

0 votes
by
_Comment made by: slipset_

Standard Clojure select-keys:

 (bench (clojure.core/select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 246382440 in 60 samples of 4106374 calls.
             Execution time mean : 243.245536 ns
    Execution time std-deviation : 2.714803 ns
   Execution time lower quantile : 238.473675 ns ( 2.5%)
   Execution time upper quantile : 248.544255 ns (97.5%)
                   Overhead used : 1.845047 ns


With transients:

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 232727220 in 60 samples of 3878787 calls.
             Execution time mean : 256.937568 ns
    Execution time std-deviation : 10.025123 ns
   Execution time lower quantile : 249.951872 ns ( 2.5%)
   Execution time upper quantile : 276.251590 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 5 outliers in 60 samples (8.3333 %)
    low-severe     3 (5.0000 %)
    low-mild     2 (3.3333 %)
 Variance from outliers : 25.4503 % Variance is moderately inflated by outliers


With reduce

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 364807860 in 60 samples of 6080131 calls.
             Execution time mean : 161.582833 ns
    Execution time std-deviation : 2.212659 ns
   Execution time lower quantile : 158.027524 ns ( 2.5%)
   Execution time upper quantile : 167.673682 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 3 outliers in 60 samples (5.0000 %)
    low-severe     3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Reduce + transient

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 318075720 in 60 samples of 5301262 calls.
             Execution time mean : 188.656164 ns
    Execution time std-deviation : 3.024952 ns
   Execution time lower quantile : 183.867285 ns ( 2.5%)
   Execution time upper quantile : 195.466784 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 4 outliers in 60 samples (6.6667 %)
    low-severe     4 (6.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers


On bigger map/selection

(bench (clojure.core/select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 56147160 in 60 samples of 935786 calls.
             Execution time mean : 1.104653 µs
    Execution time std-deviation : 36.366516 ns
   Execution time lower quantile : 1.048257 µs ( 2.5%)
   Execution time upper quantile : 1.142031 µs (97.5%)
                   Overhead used : 1.845047 ns

Found 5 outliers in 60 samples (8.3333 %)
    low-severe     4 (6.6667 %)
    low-mild     1 (1.6667 %)
 Variance from outliers : 19.0389 % Variance is moderately inflated by outliers

reduce

(bench (select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 67723500 in 60 samples of 1128725 calls.
             Execution time mean : 885.840664 ns
    Execution time std-deviation : 11.503115 ns
   Execution time lower quantile : 864.403495 ns ( 2.5%)
   Execution time upper quantile : 905.721942 ns (97.5%)
                   Overhead used : 1.845047 ns

Transient + reduce

(bench (select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 134119380 in 60 samples of 2235323 calls.
             Execution time mean : 454.587795 ns
    Execution time std-deviation : 15.681611 ns
   Execution time lower quantile : 439.822498 ns ( 2.5%)
   Execution time upper quantile : 485.797378 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 3 outliers in 60 samples (5.0000 %)
    low-severe     3 (5.0000 %)
 Variance from outliers : 20.6393 % Variance is moderately inflated by outliers


The attached patch is using both transients and reduce
0 votes
by

Comment made by: alexmiller

The proposed approach seems good to me. The description needs to reflect what was considered and chosen better.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1789 (reported by alexmiller)
...