Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by
*Description*:
It is sometimes useful to compare a collection of values against one value, clojure internally defines a predicate for this exact purpose which has some nice performance improvements over just a partial applied =.

Prior discussion with Rich: https://groups.google.com/forum/#!topic/clojure-dev/0c-VNhEKVkI

*Example usage*:

;;before:
(map (partial = 3) coll)
;;after:
(map (=to 3) coll)


*Benchmarks*:

||test||(partial = 'foo)||#(= 'foo %)||(=to 'foo)||
|small homogeneous coll|217ns|165ns|39ns|
|small eterogeneous coll,|192ns|167ns|41ns|
|big homogeneous coll|66us|52us|8us|
|big eterogeneous coll|82us|66us|27us|

*Full benchmarks output*:

(use 'criterium.core)

(defn benchmark-f [f]
  (let [colls [['foo 'foo 'foo]
               [1 :foo 'foo]
               (doall (repeat 1e3 'foo))
               (doall (take 1e3 (cycle [1 :foo 'foo])))]]
    (doseq [c colls]
      (quick-bench (run! f c)))))

(benchmark-f (partial = 'foo))
ARNING: Final GC required 1.405293826432628 % of runtime
WARNING: Final GC required 10.202923149112559 % of runtime
Evaluation count : 3116130 in 6 samples of 519355 calls.
Execution time mean : 217.723199 ns
Execution time std-deviation : 29.425291 ns
Execution time lower quantile : 189.944710 ns ( 2.5%)
Execution time upper quantile : 261.717351 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 4.2579397621583315 % of runtime
Evaluation count : 3138636 in 6 samples of 523106 calls.
Execution time mean : 198.985418 ns
Execution time std-deviation : 12.691848 ns
Execution time lower quantile : 182.441245 ns ( 2.5%)
Execution time upper quantile : 207.839280 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 6.631646134523004 % of runtime
Evaluation count : 10038 in 6 samples of 1673 calls.
Execution time mean : 66.977712 µs
Execution time std-deviation : 10.411821 µs
Execution time lower quantile : 59.620690 µs ( 2.5%)
Execution time upper quantile : 84.483254 µs (97.5%)
Overhead used : 1.863362 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe  1 (16.6667 %)
Variance from outliers : 47.3059 % Variance is moderately inflated by outliers
WARNING: Final GC required 5.272721959665122 % of runtime
Evaluation count : 7908 in 6 samples of 1318 calls.
Execution time mean : 82.588512 µs
Execution time std-deviation : 5.215537 µs
Execution time lower quantile : 75.977936 µs ( 2.5%)
Execution time upper quantile : 86.849982 µs (97.5%)
Overhead used : 1.863362 ns


(benchmark-f #(= 'foo %))
WARNING: Final GC required 1.284421364203217 % of runtime
WARNING: Final GC required 10.04376144830405 % of runtime
Evaluation count : 3643032 in 6 samples of 607172 calls.
             Execution time mean : 165.393131 ns
    Execution time std-deviation : 1.041355 ns
   Execution time lower quantile : 164.277060 ns ( 2.5%)
   Execution time upper quantile : 166.849951 ns (97.5%)
                   Overhead used : 1.605524 ns
WARNING: Final GC required 6.258680973295933 % of runtime
Evaluation count : 3584574 in 6 samples of 597429 calls.
             Execution time mean : 167.659014 ns
    Execution time std-deviation : 3.821817 ns
   Execution time lower quantile : 164.175156 ns ( 2.5%)
   Execution time upper quantile : 173.210781 ns (97.5%)
                   Overhead used : 1.605524 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
WARNING: Final GC required 6.914389197005716 % of runtime
Evaluation count : 11196 in 6 samples of 1866 calls.
             Execution time mean : 52.593759 µs
    Execution time std-deviation : 834.220092 ns



(benchmark-f (=to 'foo))
WARNING: Final GC required 7.40391654943877 % of runtime
Evaluation count : 15169068 in 6 samples of 2528178 calls.
Execution time mean : 39.937424 ns
Execution time std-deviation : 2.782661 ns
Execution time lower quantile : 37.393937 ns ( 2.5%)
Execution time upper quantile : 42.780432 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.986859953402835 % of runtime
Evaluation count : 15199992 in 6 samples of 2533332 calls.
Execution time mean : 41.229082 ns
Execution time std-deviation : 2.815533 ns
Execution time lower quantile : 37.371527 ns ( 2.5%)
Execution time upper quantile : 43.208673 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.039484046472016 % of runtime
Evaluation count : 69462 in 6 samples of 11577 calls.
Execution time mean : 8.976972 µs
Execution time std-deviation : 587.089991 ns
Execution time lower quantile : 8.505317 µs ( 2.5%)
Execution time upper quantile : 9.744296 µs (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.773010947849351 % of runtime
Evaluation count : 23352 in 6 samples of 3892 calls.
Execution time mean : 27.277376 µs
Execution time std-deviation : 2.115666 µs
Execution time lower quantile : 25.719322 µs ( 2.5%)
Execution time upper quantile : 30.123547 µs (97.5%)
Overhead used : 1.863362 ns


*Patch*: 0001-CLJ-1843-add-to-for-faster-equality-check-against-kn.patch

7 Answers

0 votes
by

Comment made by: alexmiller

Did you look at (apply = 3 coll) ? Just curious.

0 votes
by

Comment made by: bronsa

The advantage of Util/equivPred over Util/equiv is that it can assume the type of the provided argument, avoiding the runtime cost of doing the multiple instance checks that Util/equiv has to do to figure out what comparator to use internally

0 votes
by

Comment made by: alexmiller

Could you quantify the difference between these approaches on 2-3 collection sizes?

0 votes
by
_Comment made by: bronsa_

With the following setup:


(use 'criterium.core)

(defn =to [x]
  (let [pred (clojure.lang.Util/equivPred x)]
    (fn [y]
      (.equiv pred x y))))

(defn benchmark-f [f]
  (let [colls [['foo 'foo 'foo]
               [1 :foo 'foo]
               (doall (repeat 1e3 'foo))
               (doall (take 1e3 (cycle [1 :foo 'foo])))]]
    (doseq [c colls]
      (quick-bench (run! f c)))))

The results for (benchmark-f (partial = 'foo) are:



WARNING: Final GC required 1.405293826432628 % of runtime
WARNING: Final GC required 10.202923149112559 % of runtime
Evaluation count : 3116130 in 6 samples of 519355 calls.
Execution time mean : 217.723199 ns
Execution time std-deviation : 29.425291 ns
Execution time lower quantile : 189.944710 ns ( 2.5%)
Execution time upper quantile : 261.717351 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 4.2579397621583315 % of runtime
Evaluation count : 3138636 in 6 samples of 523106 calls.
Execution time mean : 198.985418 ns
Execution time std-deviation : 12.691848 ns
Execution time lower quantile : 182.441245 ns ( 2.5%)
Execution time upper quantile : 207.839280 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 6.631646134523004 % of runtime
Evaluation count : 10038 in 6 samples of 1673 calls.
Execution time mean : 66.977712 µs
Execution time std-deviation : 10.411821 µs
Execution time lower quantile : 59.620690 µs ( 2.5%)
Execution time upper quantile : 84.483254 µs (97.5%)
Overhead used : 1.863362 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe  1 (16.6667 %)
Variance from outliers : 47.3059 % Variance is moderately inflated by outliers
WARNING: Final GC required 5.272721959665122 % of runtime
Evaluation count : 7908 in 6 samples of 1318 calls.
Execution time mean : 82.588512 µs
Execution time std-deviation : 5.215537 µs
Execution time lower quantile : 75.977936 µs ( 2.5%)
Execution time upper quantile : 86.849982 µs (97.5%)
Overhead used : 1.863362 ns


The results fore (benchmark-f (=to 'foo)) are:


WARNING: Final GC required 7.40391654943877 % of runtime
Evaluation count : 15169068 in 6 samples of 2528178 calls.
Execution time mean : 39.937424 ns
Execution time std-deviation : 2.782661 ns
Execution time lower quantile : 37.393937 ns ( 2.5%)
Execution time upper quantile : 42.780432 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.986859953402835 % of runtime
Evaluation count : 15199992 in 6 samples of 2533332 calls.
Execution time mean : 41.229082 ns
Execution time std-deviation : 2.815533 ns
Execution time lower quantile : 37.371527 ns ( 2.5%)
Execution time upper quantile : 43.208673 ns (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.039484046472016 % of runtime
Evaluation count : 69462 in 6 samples of 11577 calls.
Execution time mean : 8.976972 µs
Execution time std-deviation : 587.089991 ns
Execution time lower quantile : 8.505317 µs ( 2.5%)
Execution time upper quantile : 9.744296 µs (97.5%)
Overhead used : 1.863362 ns
WARNING: Final GC required 5.773010947849351 % of runtime
Evaluation count : 23352 in 6 samples of 3892 calls.
Execution time mean : 27.277376 µs
Execution time std-deviation : 2.115666 µs
Execution time lower quantile : 25.719322 µs ( 2.5%)
Execution time upper quantile : 30.123547 µs (97.5%)
Overhead used : 1.863362 ns
0 votes
by
_Comment made by: bronsa_

Using #(= 'foo %) rather than (partial = 'foo) allows for inlining of = and makes performance a bit better, but =to still wins noticeably

WARNING: Final GC required 1.284421364203217 % of runtime
WARNING: Final GC required 10.04376144830405 % of runtime
Evaluation count : 3643032 in 6 samples of 607172 calls.
             Execution time mean : 165.393131 ns
    Execution time std-deviation : 1.041355 ns
   Execution time lower quantile : 164.277060 ns ( 2.5%)
   Execution time upper quantile : 166.849951 ns (97.5%)
                   Overhead used : 1.605524 ns
WARNING: Final GC required 6.258680973295933 % of runtime
Evaluation count : 3584574 in 6 samples of 597429 calls.
             Execution time mean : 167.659014 ns
    Execution time std-deviation : 3.821817 ns
   Execution time lower quantile : 164.175156 ns ( 2.5%)
   Execution time upper quantile : 173.210781 ns (97.5%)
                   Overhead used : 1.605524 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
WARNING: Final GC required 6.914389197005716 % of runtime
Evaluation count : 11196 in 6 samples of 1866 calls.
             Execution time mean : 52.593759 µs
    Execution time std-deviation : 834.220092 ns
   Execution time lower quantile : 51.510161 µs ( 2.5%)
   Execution time upper quantile : 53.367649 µs (97.5%)
                   Overhead used : 1.605524 ns
WARNING: Final GC required 6.179040224498723 % of runtime
Evaluation count : 9162 in 6 samples of 1527 calls.
             Execution time mean : 66.527357 µs
    Execution time std-deviation : 2.119652 µs
   Execution time lower quantile : 65.308835 µs ( 2.5%)
   Execution time upper quantile : 70.201570 µs (97.5%)
                   Overhead used : 1.605524 ns


small homogeneous coll, *(partial = 'foo)*: 217ns, *#(= 'foo %)*: 165ns, *(=to 'foo)*: 39ns
small eterogeneous coll, *(partial = 'foo)*: 192ns, *#(= 'foo %)*: 167ns, *(=to 'foo)*: 41ns
big homogeneous coll, *(partial = 'foo)*: 66us, *#(= 'foo %)*: 52us, *(=to 'foo)*: 8us
big eterogeneous coll, *(partial = 'foo*: 82us, *#(= 'foo %)*: 66us, *(=to 'foo)*: 27us
0 votes
by

Comment made by: bronsa

Apparently this was something that was already discussed a couple of years ago and Rich seemed ok with this https://groups.google.com/forum/#!topic/clojure-dev/0c-VNhEKVkI

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1843 (reported by bronsa)
...