Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by
mapv for multiple collections currently does:


(into [] (map f c1 c2 c3))


I propose to change these to:


(let [it1 (clojure.lang.RT/iter c1)
      it2 (clojure.lang.RT/iter c2)]
     (loop [out (transient [])]
       (if (and (.hasNext it1) (.hasNext it2))
         (recur (conj! out (f (.next it1) (.next it2))))
         (persistent! out))))


This is 5x faster.

For variadic arity we can check if colls is {{counted?}}:

- If yes: Use {{MultiIterator}}
- If no: Forward to {{map}} like the current implementation does.

Any thoughts on this or if this could brake something?

6 Answers

0 votes
by
_Comment made by: alexmiller_

Iterators should be treated as dangerous and unsafe by default - they are mutable, stateful, (potentially) non-thread-safe objects. Thus, their use should be carefully scrutinized and in particular you don't want them to a) leak out of a localized context (so they should be eagerly walked and released) or b) be used in a way that triggers the execution of impure functions such that they would be reexecuted later. Sequences cache their realization (and are thread-safe) which makes them slower, but much safer in avoiding this problem.

In this case, you're eagerly producing a concrete collection (not a lazy seq) so the first concern is addressed. However, the second is a little harder to reason through. I think it might be ok, but would have to think about that more.

Note that into uses transduce which uses CollReduce, which will do an iterReduce on Iterable colls, so you could probably get the same effect by simply using the transducer form:


(into [] (map f) c1)


Except that this form only takes a single coll. Multi coll support for transducers exists (see TransformerIterator/createMulti) but is not well-surfaced in the current apis. Seems like leveraging it directly would be vastly preferable to the suggestion above though:


(defn mapv2 [f & cs]
  (let [iters (map #(.iterator %) cs)
        t (clojure.lang.TransformerIterator/createMulti (map f) iters)]
    (loop [out (transient [])]
      (if (.hasNext t)
        (recur (conj! out (.next t)))
        (persistent! out)))))


Note that this assumes all cs are Iterable, which is not a valid assumption. For example: (mapv identity "abc"). So this would still need some work on fallbacks and perf testing.
0 votes
by

Comment made by: aralo

Alex: I actually did use exactly this very implementation but found it to be quite a bit slower than manually keeping track of the iterators. My guess is it's because of {{xf.applyTo}} inside the {{step}} function of {{TransformIterator}}. We already know {{source.size()}} when creating it in {{createMulti}}. So this implementation could probably be made smarter by directly invoking the right arity.
Though, I haven't benchmarked anything. So just a guess for now.

0 votes
by

Comment made by: alexmiller

Seems like a reasonable guess and something that could be optimized in TransformIterator.

0 votes
by

Comment made by: aralo

I actually thought about this a bit more. I agree that the proper step here would be to properly leverage transducers and provide better support for multiple collections with transducers.

So I'd suggest changing the game plan to:

  1. Make {{into}} and {{transduce}} work with multiple collections
  2. Throw out {{MultiIterator}}. We can get much better performance doing this inline in {{TransformIterator}}, then we avoid the unnecessary step of "build up sequence in MultiIterator" followed by "spread out this just created sequence in apply".

Though, this would make the TransformIterator much more involved.

After this, we could give many transducers the option of working over multiple sequences (keep, filter, map-indexed, take, drop etc etc) and efficiently using them with {{into}} and {{transduce}}.

If you agree, I can change the title to reflect the new scope.

0 votes
by

Comment made by: alexmiller

You are moving from "enhancing performance of an existing function" into doing api design which has a whole host of larger considerations. I think it's unlikely that we will expand either into or transduce into multi-coll territory and I don't think there are very many transducers that make sense as natively multi-coll. So, I do not think you should expand the scope of this ticket, as I would then decline it.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-2280 (reported by aralo)
...