This is to follow an answer I gave on StackOverflow (which I hope is mostly correct, but I'm not experienced enough to know for sure). More details are there than in this question.
The gist is that (:k my-set)
is much slower than (my-set :k)
and (:k my-map)
. This is very counter-intuitive since when I hold some items that I want to repeatedly query for membership I hold them in a set. It makes much more sense to hold them in a map to always be performant (map performs well in both forms of invocation).
The reason I found for the latency difference was that calling set's invoke
is much faster than calling keyword's invoke
which does a bunch of delegations and instanceof
checks.
I was able to improve the performance of {:k my-set}
by implementing ILookup
using proxy
:
(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (into {} (for [k uids] [k k])))
(def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
(valAt [k] (get uids-map k))))
;; verify
(instance? clojure.lang.APersistentSet lookupable-set) ;; true
(instance? clojure.lang.ILookup lookupable-set) ;; true
(time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
(time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs <-- faster
(time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest
I was wondering why clojure's set(s) do not implement ILookup
to begin with? Isn't lookup a big part of what people expect to do with sets? They already have the appropriate functions to do so easily. What would break if they just implemented ILookup? Or is there another reason not to do so?
Thanks.
Edit:
I've re-tested with (contains? uids :o1)
as well, as @alexmiller suggested in the comments and its still slower than the naive ILookup implementation:
(println "kw set")
(time (dotimes [i 1000000] (:o1 uids)))
(println "kw lookupable set")
(time (dotimes [i 1000000] (:o1 lookupable-set)))
(println "kw map")
(time (dotimes [i 1000000] (:o1 uids-map)))
(println "contains? set")
(time (dotimes [i 1000000] (contains? uids :o1)))
Which gave me::
kw set
"Elapsed time: 283.526096 msecs"
kw lookupable set
"Elapsed time: 121.766786 msecs"
kw map
"Elapsed time: 70.514017 msecs"
contains? set
"Elapsed time: 153.092212 msecs"
Map is still X2 times faster than set for lookups and implementing ILookup for sets is faster than contains?
.