Welcome! Please see the About page for a little more info on how this works.

+7 votes
in Collections by
edited by

I found that structural equality between persistent collections makes very few assumptions which lead to inefficient implementations, especially for vectors and maps.

The thrust of the implementation is dispatching via methods which directly iterate over the underlying arrays.

These implementations aren't the prettiest or most idiomatic but they're efficient. If this gets implemented it would look different in Java anyway.

I tried these alternative implementations and found dramatic speed ups:

Vector

(let [die (clojure.lang.Reduced. false)]
  (defn vec-eq
    [^PersistentVector v ^Iterable y]
    (let [iy (.iterator y)]
      (.reduce v (fn [_ x] (if (= x (.next iy)) true die)) true))))

This works well when comparing vectors and for vector x list
Current implementation goes through a loop from 0 to count and calls nth for every element. nth calls arrayFor() every time, while both reduce and an iterator get the backing array once per array.

Map

(let [o (Object.)
      die (clojure.lang.Reduced. false)
      eq (fn [m2] (fn [b k v]
                   (let [v' (.valAt ^IPersistentMap m2 k o)]
                     (if (.equals o v')
                       die
                       (if (= v v') true die)))))]
  (defn map-eq
    [m1 m2]
    (.kvreduce ^IKVReduce m1 (eq m2) true)))

Here, too, the implementation iterates directly over the underlying array structure.
Current implementation casts the array to seq then iterates over it while getting entries from the other map via the Map interface.
This implementation avoids casting the map to a sequence and does not allocate entries.

Sequences

When the receiver is a list the object compared against it and the receiver will be cast to a seq.

It could be more efficient to compare it with other collections via an iterator

(defn iter-eq
  [^Iterable x ^Iterable y]
  (let [ix (.iterator x)
        iy (.iterator y)]
    (loop []
      (if (.hasNext ix)
        (if (= (.next ix) (.next iy))
          (recur)
          false)
        true))))

Benchmarking

With criterium, vec-eq wins both cases. There are diminishing returns with size increase but still at n=64 vec-eq is twice as fast as =.
map-eq is also 2-3x faster for bigger maps and up to 10x faster for smaller maps

(doseq [n [1 2 4 8 16 32 64]
        :let [v1 (vec (range n))
              v2 (vec (range n))]]
  (println 'iter-eq n (iter-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (println 'vec-eq n (vec-eq v1 v2))
  (cc/quick-bench (vec-eq v1 v2))
  (println '= n (= v1 v2))
  (cc/quick-bench (= v1 v2)))


(doseq [n [1 2 4 8 16 32 64]
        :let [v1 (vec (range n))
              v2 (list* (range n))]]
  (println 'iter-eq n (iter-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (println 'vec-eq n (vec-eq v1 v2))
  (cc/quick-bench (vec-eq v1 v2))
  (println '= n (= v1 v2))
  (cc/quick-bench (= v1 v2)))
(doseq [n [1 2 4 8 16 32 64]
        :let [m1 (zipmap (range n) (range n))
              m2 (zipmap (range n) (range n))]]
  (cc/quick-bench (map-eq m1 m2))
  (cc/quick-bench (= m1 m2)))

Addendum:
Also checked the following cases:

(doseq [n [10000 100000]
        :let [v1 (vec (range n))
              v2 (assoc v1 (dec (count v1)) 7)]]
  (cc/quick-bench (vec-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (cc/quick-bench (= v1 v2)))

(doseq [n [100000]
        :let [m1 (zipmap (range n) (range n))
              m2 (assoc m1 (key (last m1)) 7)]]
  (cc/quick-bench (map-eq m1 m2))
  (cc/quick-bench (= m1 m2)))

Optimized implementations still win by huge margins

2 Answers

0 votes
by

I didn't verify your results but your benchmarks are rather limited in scope and only tests basically tiny collection sizes. How do things look if you get to 1.000, 10.000, 100.000, etc items?

I'd suspect that things will look drastically different if you compare things that actually take advantage of "structural sharing". For example create one vector and update the last element and then compare? That should be worst case for your implementation but quite fine for the current. Same for maps.

That being said the optimized "reduce" implementations are rather new so they might be more efficient than older stuff in some places. Just make sure to verify more scenarios before coming to a conclusion.

by
Sadly, if you look at the implementation of equiv you'll see it makes no use of structural sharing to short circuit, which is another possible optimization.
Like I mentioned in the original post, vector equality, for example, is implemented by calling `nth()` at most N times.
Checked now for 1e4 elements, vec-eq is about 20% faster, while iter-eq is up to 60% faster.
Similar results for 1e5 elements
For 1e5 map elements map-eq takes only 40% of the time `=` takes, where `m2 <- (assoc m1 (key (last m1)) 7)`
by
Doh, I sort of blindly assumed structural sharing would work but I guess that would need to reach too far into the specifics of each implementation and fail with special types that just implement the interfaces. I checked the CLJS implementation and that already uses an iterator based version.
0 votes
by

Microoptimization PRs are hard to consider unless when not connected or motivated by an application need. The threshold for consideration of such an enhancement is high.

btw, those impls are trivially broken

user=> (vec-eq [1 2 3] [1 2 3 4])
true
user=> (map-eq {1 2} {1 2 3 4})
true
by
I take it as read that those considering this issue are familiar with how equiv() is implemented. I elided count checks just as I have elided instanceof checks. These are merely proofs of concepts that there is lots of performance lying on the floor and it should be considered.
Application need? Using collections as keys or set members has BAD performance.
by
I'm saying the following with the tone of someone who is welcoming to optimizations:

In the past, it's been more productive to motivate optimizations starting from the application demand or problem statement side, not from the impl side. Perhaps we find a 2x improvement in equality, but it affects 0.1% of the total runtime of a real world application. In that case even a 10x improvement isn't worth investment. (Reviewing is a huge commitment; Fogus, Alex, and Rich spend a ton of time bringing rigor to tickets)

Is it true that using collections as keys or set members automatically makes an application slow? In that reality, optimization would be compelling, but I suspect it's not the case. Around 1.6, the hash impl changed because of performance problems suffered in a real world application. My general advice as someone who has committed performance improvements to Clojure over the years is: stay connected to the problem/application, and to start from that framing.

Don't "elide" correctness -- benchmarks would be invalid. Use generative tests to guide compatibility/correctness checks.

That being said, I'm not saying that there are not some potential improvements. Moving to reduce-based or iterator-based paths has been historically very helpful. But be open-minded to the possibility that it may not matter to an application.
by
This makes sense.
As far as applications which could benefit from this improvement, I'd say rule engines and core.logic are at the top of the list.
Examples:
odoyle https://github.com/oakes/odoyle-rules/blob/master/src/odoyle/rules.cljc#L377
Clara:
- activation-group-fn: https://github.com/cerner/clara-rules/blob/main/src/main/clojure/clara/rules/engine.cljc#L2113
Used to produce activation group as a key here: https://github.com/cerner/clara-rules/blob/main/src/main/clojure/clara/rules/memory.cljc
core.logic: indexing relations where lvars can be collections https://github.com/clojure/core.logic/blob/master/src/main/clojure/clojure/core/logic/pldb.clj

Out of these I only profiled odoyle. While it still has room for optimization, it spends about ~10% in pcequiv()

The reason I elided a "full" solution is mainly time. I think this is indicative enough of big possible improvement. Then I present it to the core team expecting one of three possible responses:
1. Nice findings, but don't bother for now.
2. Go ahead, send a full patch with comprehensive benchmarks
3. We'll do it ourselves

I don't mind any of these responses, but like you said, reviewing is a huge commitment and working on it won't be a trivial task for me either. I don't want to put in significant effort on a patch that might just sit in the backlog for a long time because the core team has its hands full and the issue has low priority.

I'll gladly work on a full patch with performance test matrix if there's interest
by
I think there would be interest in equality optimizations. I also think it's challenging to retain the current genericity (a synonym for "makes very few assumptions") and also taking into account that we're implementing this over types we control, and closed types we don't (Java stuff), and open types we don't (external Clojure colls). It's common for genericity to fight with concrete optimizations like this and "challenging" doesn't mean don't do it of course. :) I don't think you can really engage with the problem though without getting into the real impls where you can see where the selection of strategy affects the perf too, particularly on small colls.

As Ghadi said, it's helpful to know how impactful such a change could be for real stuff to judge its priority. When I research stuff like this, I usually hack up Clojure to collect distributions on call sites and then run stuff to see how frequently = is called and on what types/sizes in distribution. Seems like you've done a little of that, more would be useful.

You've suggested some impl rewrites and these seem like intuitively good options if you know things, but I suspect there are a variety of choices depending on what assumptions you can make (specific concrete type, reducible, iterable, seqable, etc). We would usually try to start enumerating some of those.
...