Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Clojure by

I was surprised to find out that split-with is literally implemented as [(take-while pred coll) (drop-while pred coll)], meaning that it does a redundant amount of work applying the predicate twice for every element that satisfies it.

The following code simulates an expensive predicate, it takes 800+ ms and prints 0 1 2 3 0 1 2 3:

(time
  (mapv doall
    (let [xs   (range 10)
          pred #(do (println %) (Thread/sleep 100) (< % 3))]
      (split-with pred xs))))

Is there any reason for it not to be implemented more efficiently to avoid these redundant evaluations?

Here is my initial attempt at an implementation - note that realizing a single element of drops causes takes to be fully realized but not the other way round.

(defn split-with*
  "Like `split-with` but evaluates `pred` at most once on each element"
  [pred coll]
  (let [takes (take-while pred coll)
        drops (lazy-seq (drop (count takes) coll))]
    [takes drops]))

Additional context:

This would be a breaking change for current code that involves side effects in the predicate and relies on it running twice.
But this is arguably a bad idea - e.g the following side-effectful code violates the reasonable assumption that split-with actually splits a collection into two parts.
Due to laziness, the result also changes depending on which sequence is realized first.

(let [xs      (range 10)
      counter (atom 1)
      pred    (fn [x] (< x (swap! counter + 1/2)))]
  (split-with pred xs))
;; => [(0 1 2) (7 8 9)]

1 Answer

+2 votes
by
...