I was surprised to find out that split-with is literally implemented as [(take-while pred coll) (drop-while pred coll)], meaning that it does a redundant amount of work applying the predicate twice for every element that satisfies it.
The following code simulates an expensive predicate, it takes 800+ ms and prints 0 1 2 3 0 1 2 3
:
(time
(mapv doall
(let [xs (range 10)
pred #(do (println %) (Thread/sleep 100) (< % 3))]
(split-with pred xs))))
Is there any reason for it not to be implemented more efficiently to avoid these redundant evaluations?
Here is my initial attempt at an implementation - note that realizing a single element of drops
causes takes
to be fully realized but not the other way round.
(defn split-with*
"Like `split-with` but evaluates `pred` at most once on each element"
[pred coll]
(let [takes (take-while pred coll)
drops (lazy-seq (drop (count takes) coll))]
[takes drops]))
Additional context:
This would be a breaking change for current code that involves side effects in the predicate and relies on it running twice.
But this is arguably a bad idea - e.g the following side-effectful code violates the reasonable assumption that split-with
actually splits a collection into two parts.
Due to laziness, the result also changes depending on which sequence is realized first.
(let [xs (range 10)
counter (atom 1)
pred (fn [x] (< x (swap! counter + 1/2)))]
(split-with pred xs))
;; => [(0 1 2) (7 8 9)]