Persistent collections can implement equiv() more efficiently

Question

Persistent collections can implement equiv() more efficiently

asked Sep 30, 2021 in Collections by Ben Sless
retagged Jul 5, 2023 by alexmiller

I found that structural equality between persistent collections makes very few assumptions which lead to inefficient implementations, especially for vectors and maps.

The thrust of the implementation is dispatching via methods which directly iterate over the underlying arrays.

These implementations aren't the prettiest or most idiomatic but they're efficient. If this gets implemented it would look different in Java anyway.

I tried these alternative implementations and found dramatic speed ups:

Vector

(let [die (clojure.lang.Reduced. false)]
  (defn vec-eq
    [^PersistentVector v ^Iterable y]
    (let [iy (.iterator y)]
      (.reduce v (fn [_ x] (if (= x (.next iy)) true die)) true))))

This works well when comparing vectors and for vector x list
Current implementation goes through a loop from 0 to count and calls nth for every element. nth calls arrayFor() every time, while both reduce and an iterator get the backing array once per array.

Map

(let [o (Object.)
      die (clojure.lang.Reduced. false)
      eq (fn [m2] (fn [b k v]
                   (let [v' (.valAt ^IPersistentMap m2 k o)]
                     (if (.equals o v')
                       die
                       (if (= v v') true die)))))]
  (defn map-eq
    [m1 m2]
    (.kvreduce ^IKVReduce m1 (eq m2) true)))

Here, too, the implementation iterates directly over the underlying array structure.
Current implementation casts the array to seq then iterates over it while getting entries from the other map via the Map interface.
This implementation avoids casting the map to a sequence and does not allocate entries.

Sequences

When the receiver is a list the object compared against it and the receiver will be cast to a seq.

It could be more efficient to compare it with other collections via an iterator

(defn iter-eq
  [^Iterable x ^Iterable y]
  (let [ix (.iterator x)
        iy (.iterator y)]
    (loop []
      (if (.hasNext ix)
        (if (= (.next ix) (.next iy))
          (recur)
          false)
        true))))

Benchmarking

With criterium, vec-eq wins both cases. There are diminishing returns with size increase but still at n=64 vec-eq is twice as fast as =.
map-eq is also 2-3x faster for bigger maps and up to 10x faster for smaller maps

(doseq [n [1 2 4 8 16 32 64]
        :let [v1 (vec (range n))
              v2 (vec (range n))]]
  (println 'iter-eq n (iter-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (println 'vec-eq n (vec-eq v1 v2))
  (cc/quick-bench (vec-eq v1 v2))
  (println '= n (= v1 v2))
  (cc/quick-bench (= v1 v2)))


(doseq [n [1 2 4 8 16 32 64]
        :let [v1 (vec (range n))
              v2 (list* (range n))]]
  (println 'iter-eq n (iter-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (println 'vec-eq n (vec-eq v1 v2))
  (cc/quick-bench (vec-eq v1 v2))
  (println '= n (= v1 v2))
  (cc/quick-bench (= v1 v2)))

(doseq [n [1 2 4 8 16 32 64]
        :let [m1 (zipmap (range n) (range n))
              m2 (zipmap (range n) (range n))]]
  (cc/quick-bench (map-eq m1 m2))
  (cc/quick-bench (= m1 m2)))

Addendum:
Also checked the following cases:

(doseq [n [10000 100000]
        :let [v1 (vec (range n))
              v2 (assoc v1 (dec (count v1)) 7)]]
  (cc/quick-bench (vec-eq v1 v2))
  (cc/quick-bench (iter-eq v1 v2))
  (cc/quick-bench (= v1 v2)))

(doseq [n [100000]
        :let [m1 (zipmap (range n) (range n))
              m2 (assoc m1 (key (last m1)) 7)]]
  (cc/quick-bench (map-eq m1 m2))
  (cc/quick-bench (= m1 m2)))

Optimized implementations still win by huge margins

commented Jul 5, 2023 by alexmiller

3 Answers

Ben Sless · Answer 1 · 2022-01-31T13:35:46+0000

small update - wrote a bit of Java, used test.check to generate maps, here are some results:

 | size | seed |   time before (us) |     time after (us) | improvement |
 |------+------+--------------------+---------------------+-------------|
 |   10 |    0 | 0.7821998686829845 | 0.36678822554200413 |   2.1325654 |
 |   44 |    1 |  4.330622612178792 |   2.103437417654809 |   2.0588312 |
 |   31 |    2 | 3.0628944543188688 |  1.3886572837898974 |   2.2056518 |
 |   21 |    3 |  2.028679128233322 |  0.9572009284455004 |   2.1193869 |
 |   39 |    4 | 3.9265516612189715 |  1.8362321591272501 |   2.1383743 |
 |   18 |    5 | 1.6854334183962798 |  0.8202897942521229 |   2.0546805 |
 |   55 |    6 |  4.908545983501916 |   2.279236807427374 |   2.1535919 |
 |   45 |    7 |  4.464427896621236 |  2.1081167721518987 |   2.1177327 |
 |    6 |    8 | 0.3864066521455632 |  0.1928088585042629 |   2.0040918 |
 |   26 |    9 | 2.7114264338699283 |  1.3179156998000194 |   2.0573595 |
 |   86 |   10 |  8.879776767221973 |   4.380430951657479 |   2.0271468 |
 |   16 |   11 |  1.448846888824073 |  0.6990313285286198 |   2.0726494 |
 |   86 |   12 |  8.340080118652248 |   3.922289043010332 |   2.1263298 |
 |   82 |   13 |  8.249968350056667 |   4.000736723253899 |   2.0621123 |
 |   90 |   14 |  9.004991020408164 |   4.293898687932677 |   2.0971596 |
 |   18 |   15 | 1.8062551014332244 |  0.8815394179030271 |   2.0489783 |
 |   65 |   16 |  6.491169509571479 |   3.130686928716269 |   2.0734010 |
 |    1 |   17 | 0.1196704726877019 | 0.07041214138259107 |   1.6995716 |
 |   12 |   18 | 1.1530046459080272 |  0.6082699042686944 |   1.8955477 |
 |   79 |   19 |  7.466010735312539 |  3.3860477035184937 |   2.2049337 |

Implementations are a specialization of equiv:

private boolean associativeEquiv(Associative m) {
    for(int i=0;i < array.length;i+=2)
        {
            Object k = array[i];
            IMapEntry e = m.entryAt(k);
            if (e == null)
                return false;
            if(!Util.equiv(array[i+1], e.val()))
                return false;
        }
    return true;
}

private static Object SENTINEL = new Object();

private boolean mapEquiv(Map m) {
    for(int i=0;i < array.length;i+=2)
        {
            Object k = array[i];
            Object v = m.getOrDefault(k, SENTINEL);
            if (SENTINEL == v)
                return false;
            if(!Util.equiv(array[i+1], v))
                return false;
        }

    return true;
}

@Override
public boolean equiv(Object obj){
    if(!(obj instanceof Map))
        return false;
    if(obj instanceof IPersistentMap && !(obj instanceof MapEquivalence))
        return false;

    Map m = (Map) obj;

    if(m.size() != size())
        return false;

    if (m instanceof Associative)
        return associativeEquiv((Associative) m);
    return mapEquiv(m);
}

Thomas Heller · Answer 2 · 2021-10-01T06:16:25+0000

I didn't verify your results but your benchmarks are rather limited in scope and only tests basically tiny collection sizes. How do things look if you get to 1.000, 10.000, 100.000, etc items?

I'd suspect that things will look drastically different if you compare things that actually take advantage of "structural sharing". For example create one vector and update the last element and then compare? That should be worst case for your implementation but quite fine for the current. Same for maps.

That being said the optimized "reduce" implementations are rather new so they might be more efficient than older stuff in some places. Just make sure to verify more scenarios before coming to a conclusion.

Ghadi Shayban · Answer 3 · 2021-10-01T16:09:24+0000

answered Oct 1, 2021 by Ghadi Shayban

Microoptimization PRs are hard to consider unless when not connected or motivated by an application need. The threshold for consideration of such an enhancement is high.

btw, those impls are trivially broken

user=> (vec-eq [1 2 3] [1 2 3 4])
true
user=> (map-eq {1 2} {1 2 3 4})
true

commented Oct 1, 2021 by Ben Sless

commented Oct 1, 2021 by Ghadi Shayban

I'm saying the following with the tone of someone who is welcoming to optimizations:

In the past, it's been more productive to motivate optimizations starting from the application demand or problem statement side, not from the impl side. Perhaps we find a 2x improvement in equality, but it affects 0.1% of the total runtime of a real world application. In that case even a 10x improvement isn't worth investment. (Reviewing is a huge commitment; Fogus, Alex, and Rich spend a ton of time bringing rigor to tickets)

Is it true that using collections as keys or set members automatically makes an application slow? In that reality, optimization would be compelling, but I suspect it's not the case. Around 1.6, the hash impl changed because of performance problems suffered in a real world application. My general advice as someone who has committed performance improvements to Clojure over the years is: stay connected to the problem/application, and to start from that framing.

Don't "elide" correctness -- benchmarks would be invalid. Use generative tests to guide compatibility/correctness checks.

That being said, I'm not saying that there are not some potential improvements. Moving to reduce-based or iterator-based paths has been historically very helpful. But be open-minded to the possibility that it may not matter to an application.

commented Oct 3, 2021 by Ben Sless

This makes sense.
As far as applications which could benefit from this improvement, I'd say rule engines and core.logic are at the top of the list.
Examples:
odoyle https://github.com/oakes/odoyle-rules/blob/master/src/odoyle/rules.cljc#L377
Clara:
- activation-group-fn: https://github.com/cerner/clara-rules/blob/main/src/main/clojure/clara/rules/engine.cljc#L2113
Used to produce activation group as a key here: https://github.com/cerner/clara-rules/blob/main/src/main/clojure/clara/rules/memory.cljc
core.logic: indexing relations where lvars can be collections https://github.com/clojure/core.logic/blob/master/src/main/clojure/clojure/core/logic/pldb.clj

Out of these I only profiled odoyle. While it still has room for optimization, it spends about ~10% in pcequiv()

The reason I elided a "full" solution is mainly time. I think this is indicative enough of big possible improvement. Then I present it to the core team expecting one of three possible responses:
1. Nice findings, but don't bother for now.
2. Go ahead, send a full patch with comprehensive benchmarks
3. We'll do it ourselves

I don't mind any of these responses, but like you said, reviewing is a huge commitment and working on it won't be a trivial task for me either. I don't want to put in significant effort on a patch that might just sit in the backlog for a long time because the core team has its hands full and the issue has low priority.

I'll gladly work on a full patch with performance test matrix if there's interest

commented Oct 4, 2021 by alexmiller

I think there would be interest in equality optimizations. I also think it's challenging to retain the current genericity (a synonym for "makes very few assumptions") and also taking into account that we're implementing this over types we control, and closed types we don't (Java stuff), and open types we don't (external Clojure colls). It's common for genericity to fight with concrete optimizations like this and "challenging" doesn't mean don't do it of course. :) I don't think you can really engage with the problem though without getting into the real impls where you can see where the selection of strategy affects the perf too, particularly on small colls.

As Ghadi said, it's helpful to know how impactful such a change could be for real stuff to judge its priority. When I research stuff like this, I usually hack up Clojure to collect distributions on call sites and then run stuff to see how frequently = is called and on what types/sizes in distribution. Seems like you've done a little of that, more would be useful.

You've suggested some impl rewrites and these seem like intuitively good options if you know things, but I suspect there are a variety of choices depending on what assumptions you can make (specific concrete type, reducible, iterable, seqable, etc). We would usually try to start enumerating some of those.

commented Nov 29, 2021 by alexmiller

Persistent collections can implement equiv() more efficiently

Vector

Map

Sequences

Benchmarking

Please log in or register to add a comment.

Please log in or register to answer this question.

3 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Persistent collections can implement equiv() more efficiently

Vector

Map

Sequences

Benchmarking

Please log in or register to add a comment.

Please log in or register to answer this question.

3 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories