Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

When looking for execution speed in an application, it's tempting to reach for a java.util.ArrayList both when constructing maps and vectors. However there are some pitfalls in doing so. It is not obvious that for eg vectors, one should use LazilyPersistentVector/createOwning over eg PersistentVector/adopt, the former working correctly, whereas the only works for vectors smaller than 32.

Likewise for maps, where one could choose to create a PersistentArrayMap if there are less than 9 entries in the map, but should use a PersistentHashMap if there are 9 or more entries.

I don't want to prescribe solutions here, but a couple of fns in clojure.core that took arrays as params and returned the appropriate data structure:
(array->vector arr) ;; does what LazilyPersistentVector/createOwning does (array->map arr) ;; returns a PAM or PHM depending on size

1 Answer

0 votes
by
edited by

Trying to get to the root problem here - is it about construction performance or is it specifically about array -> persistent vector? For maps, I assume it's array of kv -> persistent map?

There is an interface for array->vector already (into [] arr), so maybe it's really about making that faster (which actually intersects with something else I'm working on currently). Probably the best equivalent for maps is (apply hash-map arr) but that picks an impl as you say.

by
It's mainly about making things faster. Playing around with data.json and taking som inspiration from @cnuernber's recent work on dtype-next, I was checking out using java.util.ArrayList rather than transient clojure data structures.

When storing stuff for the maps, I stored the data in [k1, v1, k2, v2...] format
by
Looking at ArrayList vs PV w/transients, I don't see much difference:

% java -version
openjdk version "17.0.1" 2021-10-19
OpenJDK Runtime Environment Temurin-17.0.1+12 (build 17.0.1+12)
OpenJDK 64-Bit Server VM Temurin-17.0.1+12 (build 17.0.1+12, mixed mode, sharing)
% clj -Sdeps '{:deps {org.clojure/clojure {:mvn/version "1.11.1"}}}'
Clojure 1.11.1

(defn make-al [^long c]
  (loop [i 0
         out (java.util.ArrayList.)]
    (if (< i c)
      (do
        (.add ^java.util.ArrayList out i)
        (recur (inc i) out))
      out)))

(dotimes [_ 20] (time (make-al 10000000)))
   
;; drop 10
"Elapsed time: 118.936307 msecs"
"Elapsed time: 378.941495 msecs"
"Elapsed time: 451.090741 msecs"
"Elapsed time: 231.632739 msecs"
"Elapsed time: 533.601017 msecs"
"Elapsed time: 178.208623 msecs"
"Elapsed time: 566.648574 msecs"
"Elapsed time: 223.109572 msecs"
"Elapsed time: 404.011696 msecs"
"Elapsed time: 374.543087 msecs"

(defn make-pv [^long c]
  (loop [i 0
         out (transient [])]
    (if (< i c)
      (recur (inc i) (conj! out i))
      (persistent! out))))

(dotimes [_ 20] (time (make-pv 10000000)))

;; drop 10
"Elapsed time: 238.714966 msecs"
"Elapsed time: 139.776764 msecs"
"Elapsed time: 287.93457 msecs"
"Elapsed time: 212.816761 msecs"
"Elapsed time: 337.137074 msecs"
"Elapsed time: 386.315848 msecs"
"Elapsed time: 208.427246 msecs"
"Elapsed time: 291.750425 msecs"
"Elapsed time: 321.466389 msecs"
"Elapsed time: 251.875222 msecs"

But looking at HashMap vs PersistentMap w/transients, I do see a significant difference (10x slower):

(defn make-hm [^long c]
  (loop [i 0
         out (java.util.HashMap.)]
    (if (< i c)
      (do
        (.put ^java.util.HashMap out i i)
        (recur (inc i) out))
      out)))

(dotimes [_ 20] (time (make-hm 10000000)))

;; drop 10
"Elapsed time: 949.351029 msecs"
"Elapsed time: 719.626631 msecs"
"Elapsed time: 664.274403 msecs"
"Elapsed time: 718.46743 msecs"
"Elapsed time: 880.380957 msecs"
"Elapsed time: 735.682114 msecs"
"Elapsed time: 611.627886 msecs"
"Elapsed time: 775.128433 msecs"
"Elapsed time: 537.130159 msecs"
"Elapsed time: 775.980626 msecs"


(defn make-pm [^long c]
  (loop [i 0
         out (transient {})]
    (if (< i c)
      (recur (inc i) (assoc! out i i))
      (persistent! out))))

(dotimes [_ 20] (time (make-pm 10000000)))

"Elapsed time: 5129.976085 msecs"
"Elapsed time: 5626.573669 msecs"
"Elapsed time: 5337.985317 msecs"
"Elapsed time: 5151.215195 msecs"
"Elapsed time: 5175.246837 msecs"
"Elapsed time: 5636.8717 msecs"
"Elapsed time: 5136.285851 msecs"
"Elapsed time: 5270.226782 msecs"
"Elapsed time: 5018.800694 msecs"
"Elapsed time: 5394.596752 msecs"

Based on that, I think it's interesting to consider how to make the latter faster..
by
The list example seems a bit more interesting:
```
clojure.data.json-perf-test> (quick-bench (make-pv 20))
Evaluation count : 2706714 in 6 samples of 451119 calls.
             Execution time mean : 218.595632 ns
    Execution time std-deviation : 1.125857 ns
   Execution time lower quantile : 216.403707 ns ( 2.5%)
   Execution time upper quantile : 219.447999 ns (97.5%)
                   Overhead used : 2.124369 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;; => nil
clojure.data.json-perf-test> (quick-bench (make-al 20))
Evaluation count : 5192388 in 6 samples of 865398 calls.
             Execution time mean : 116.527862 ns
    Execution time std-deviation : 5.242512 ns
   Execution time lower quantile : 113.078599 ns ( 2.5%)
   Execution time upper quantile : 125.437194 ns (97.5%)
                   Overhead used : 2.124369 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;; => nil
clojure.data.json-perf-test> (quick-bench (make-al-2 20))
Evaluation count : 8299938 in 6 samples of 1383323 calls.
             Execution time mean : 73.889678 ns
    Execution time std-deviation : 4.373601 ns
   Execution time lower quantile : 70.739550 ns ( 2.5%)
   Execution time upper quantile : 80.056859 ns (97.5%)
                   Overhead used : 2.124369 ns
;; => nil
clojure.data.json-perf-test> (quick-bench (make-pv 10000000))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 167.649450 ms
    Execution time std-deviation : 50.505810 ms
   Execution time lower quantile : 105.608206 ms ( 2.5%)
   Execution time upper quantile : 228.341248 ms (97.5%)
                   Overhead used : 2.124369 ns
;; => nil
clojure.data.json-perf-test> (quick-bench (make-al 10000000))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 149.405623 ms
    Execution time std-deviation : 146.660162 ms
   Execution time lower quantile : 64.432956 ms ( 2.5%)
   Execution time upper quantile : 332.819503 ms (97.5%)
                   Overhead used : 2.124369 ns
;; => nil
clojure.data.json-perf-test> (quick-bench (make-al-2 10000000))
Evaluation count : 12 in 6 samples of 2 calls.
             Execution time mean : 145.689554 ms
    Execution time std-deviation : 71.749785 ms
   Execution time lower quantile : 46.855186 ms ( 2.5%)
   Execution time upper quantile : 223.408978 ms (97.5%)
                   Overhead used : 2.124369 ns
;; => nil

```
With `make-al-2` defined as:
```
(defn make-al-2 [^long c]
  (let [out (java.util.ArrayList.)]
    (loop [i 0]
      (when (< i c)
        (.add ^java.util.ArrayList out i)
        (recur (inc i))))
    out))
```

So for smaller sizes, moving the the array-list out of the loop seems important?
by
edited by
Decompiled `make-al-2`
```
    public static Object invokeStatic(final long c) {
        final Object out = new ArrayList();
        for (long i = 0L; i < c; i = Numbers.inc(i)) {
            final Boolean b = ((ArrayList)out).add(Numbers.num(i)) ? Boolean.TRUE : Boolean.FALSE;
        }
        return out;
    }
```
decompiled `make-al`
```
    public static Object invokeStatic(final long c) {
        long i = 0L;
        Object out = new ArrayList();
        while (i < c) {
            final Boolean b = ((ArrayList)out).add(Numbers.num(i)) ? Boolean.TRUE : Boolean.FALSE;
            final long inc = Numbers.inc(i);
            out = out;
            i = inc;
        }
        return out;
    }
```
by
I guess the is orthogonal to the question of transients being faster or slower than ArrayLists, but it seems like working with ArrayLists lets the Clojure compiler generate faster code (which it seems like only matters for smaller collection sizes?)
by
Hopefully, lastly, when adding a constant to the list rather than `i`, things change even more:
```
clojure.data.json-perf-test> (quick-bench (make-pv 10000000))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 112.911790 ms
    Execution time std-deviation : 16.622923 ms
   Execution time lower quantile : 100.571915 ms ( 2.5%)
   Execution time upper quantile : 132.719706 ms (97.5%)
                   Overhead used : 2.124369 ns
;; => nil
clojure.data.json-perf-test> (quick-bench (make-al-2 10000000))
Evaluation count : 18 in 6 samples of 3 calls.
             Execution time mean : 48.052989 ms
    Execution time std-deviation : 7.951804 ms
   Execution time lower quantile : 39.031470 ms ( 2.5%)
   Execution time upper quantile : 58.394364 ms (97.5%)
                   Overhead used : 2.124369 ns
;; => nil
clojure.data.json-perf-test>
```

```
(defn make-pv [^long c]
   (let [])
   (loop [i 0
          out (transient [])]
     (if (< i c)
       (recur (inc i) (conj! out "3"))
       (persistent! out))))
```
```
 (defn make-al-2 [^long c]
   (let [out (java.util.ArrayList.)]
     (loop [i 0]
       (when (< i c)
         (.add ^java.util.ArrayList out "3")
         (recur (inc i))))
     out))
```
by
For my various use cases it would be helpful to have fast construction pathways from object[] data.  For persistent vector, createOwning is perfect although it isn't optimal I do no think.

If you have the array of objects up front there should be extremely efficient ways to create either a vector or map - more efficient than creating a transient and looping through them.  For instance in the map case if you right off the bat create hash codes of all the keys you should be able to reorder them based on hash codes and then directly create the HAMT tree without much of an iteration loop simply by taking subsections of the hashcode-index array.

For the persistent vector case it should boil down to taking length 32 subsections of the input array and directly creating the hamt from those without doing the element-by-element iteration.  This pathway allows clients to rapidly construct these datastructures from other sources without involving a manual iteration loop.

IF the above is implemented well then I could get persistent datastructures to outperform or match mutable datastructures in the json parse case as I have to iterate item by item in the hashtable case as opposed to using a bulk creation method.
...