Welcome! Please see the About page for a little more info on how this works.

+1 vote
in Clojure by
edited by

This is to follow an answer I gave on StackOverflow (which I hope is mostly correct, but I'm not experienced enough to know for sure). More details are there than in this question.

The gist is that (:k my-set) is much slower than (my-set :k) and (:k my-map). This is very counter-intuitive since when I hold some items that I want to repeatedly query for membership I hold them in a set. It makes much more sense to hold them in a map to always be performant (map performs well in both forms of invocation).

The reason I found for the latency difference was that calling set's invoke is much faster than calling keyword's invoke which does a bunch of delegations and instanceof checks.

I was able to improve the performance of {:k my-set} by implementing ILookup using proxy:

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (into {} (for [k uids] [k k])))
(def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
                      (valAt [k] (get uids-map k))))

;; verify
(instance? clojure.lang.APersistentSet lookupable-set) ;; true
(instance? clojure.lang.ILookup lookupable-set) ;; true

(time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
(time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs  <-- faster
(time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest

I was wondering why clojure's set(s) do not implement ILookup to begin with? Isn't lookup a big part of what people expect to do with sets? They already have the appropriate functions to do so easily. What would break if they just implemented ILookup? Or is there another reason not to do so?

Thanks.

Edit:

I've re-tested with (contains? uids :o1) as well, as @alexmiller suggested in the comments and its still slower than the naive ILookup implementation:

(println "kw set")
(time (dotimes [i 1000000] (:o1 uids)))

(println "kw lookupable set")
(time (dotimes [i 1000000] (:o1 lookupable-set)))

(println "kw map")
(time (dotimes [i 1000000] (:o1 uids-map)))

(println "contains? set")
(time (dotimes [i 1000000] (contains? uids :o1)))

Which gave me::

kw set
"Elapsed time: 283.526096 msecs"

kw lookupable set
"Elapsed time: 121.766786 msecs"

kw map
"Elapsed time: 70.514017 msecs"

contains? set
"Elapsed time: 153.092212 msecs"

Map is still X2 times faster than set for lookups and implementing ILookup for sets is faster than contains?.

1 Answer

+2 votes
by
selected by
 
Best answer

Starting from the base question re ILookup, ILookup is an abstraction about looking up values for keys in an associative data structure. In Clojure, sets are not associative data structures (even though they are implemented around maps and in some cases present as maps of k->k ie get). Thus, it does not make sense to extend ILookup to sets based on my understanding of the abstraction. Similarly, keyword invocation is optimized for associative lookup. The preferred function to check whether a set contains a value is contains?.

by
"ILookup is an abstraction about looking up values for keys in an associative data structure" - If we treat sets as functions is it not enough to think of them as implementing an associative transformation? Functions are eventually mappings (at least pure functions, like sets). Also, there is an "associative" interface which is separate (and implements) ILookup so maybe it makes sense to make sets as performant as maps simply because they're the go-to structure for lookups in a closed list of values AFAICT. Thank you for the answer. I've also edited my answer to show `contains?` still doesn't make sets as performant as maps (or even "ILookup"-able sets) which is disappointing for many prevalent use cases IMO.
...