I found that structural equality between persistent collections makes very few assumptions which lead to inefficient implementations, especially for vectors and maps.
The thrust of the implementation is dispatching via methods which directly iterate over the underlying arrays.
These implementations aren't the prettiest or most idiomatic but they're efficient. If this gets implemented it would look different in Java anyway.
I tried these alternative implementations and found dramatic speed ups:
Vector
(let [die (clojure.lang.Reduced. false)]
(defn vec-eq
[^PersistentVector v ^Iterable y]
(let [iy (.iterator y)]
(.reduce v (fn [_ x] (if (= x (.next iy)) true die)) true))))
This works well when comparing vectors and for vector x list
Current implementation goes through a loop from 0 to count and calls nth for every element. nth calls arrayFor() every time, while both reduce and an iterator get the backing array once per array.
Map
(let [o (Object.)
die (clojure.lang.Reduced. false)
eq (fn [m2] (fn [b k v]
(let [v' (.valAt ^IPersistentMap m2 k o)]
(if (.equals o v')
die
(if (= v v') true die)))))]
(defn map-eq
[m1 m2]
(.kvreduce ^IKVReduce m1 (eq m2) true)))
Here, too, the implementation iterates directly over the underlying array structure.
Current implementation casts the array to seq then iterates over it while getting entries from the other map via the Map
interface.
This implementation avoids casting the map to a sequence and does not allocate entries.
Sequences
When the receiver is a list the object compared against it and the receiver will be cast to a seq.
It could be more efficient to compare it with other collections via an iterator
(defn iter-eq
[^Iterable x ^Iterable y]
(let [ix (.iterator x)
iy (.iterator y)]
(loop []
(if (.hasNext ix)
(if (= (.next ix) (.next iy))
(recur)
false)
true))))
Benchmarking
With criterium, vec-eq wins both cases. There are diminishing returns with size increase but still at n=64 vec-eq is twice as fast as =.
map-eq is also 2-3x faster for bigger maps and up to 10x faster for smaller maps
(doseq [n [1 2 4 8 16 32 64]
:let [v1 (vec (range n))
v2 (vec (range n))]]
(println 'iter-eq n (iter-eq v1 v2))
(cc/quick-bench (iter-eq v1 v2))
(println 'vec-eq n (vec-eq v1 v2))
(cc/quick-bench (vec-eq v1 v2))
(println '= n (= v1 v2))
(cc/quick-bench (= v1 v2)))
(doseq [n [1 2 4 8 16 32 64]
:let [v1 (vec (range n))
v2 (list* (range n))]]
(println 'iter-eq n (iter-eq v1 v2))
(cc/quick-bench (iter-eq v1 v2))
(println 'vec-eq n (vec-eq v1 v2))
(cc/quick-bench (vec-eq v1 v2))
(println '= n (= v1 v2))
(cc/quick-bench (= v1 v2)))
(doseq [n [1 2 4 8 16 32 64]
:let [m1 (zipmap (range n) (range n))
m2 (zipmap (range n) (range n))]]
(cc/quick-bench (map-eq m1 m2))
(cc/quick-bench (= m1 m2)))
Addendum:
Also checked the following cases:
(doseq [n [10000 100000]
:let [v1 (vec (range n))
v2 (assoc v1 (dec (count v1)) 7)]]
(cc/quick-bench (vec-eq v1 v2))
(cc/quick-bench (iter-eq v1 v2))
(cc/quick-bench (= v1 v2)))
(doseq [n [100000]
:let [m1 (zipmap (range n) (range n))
m2 (assoc m1 (key (last m1)) 7)]]
(cc/quick-bench (map-eq m1 m2))
(cc/quick-bench (= m1 m2)))
Optimized implementations still win by huge margins