Welcome! Please see the About page for a little more info on how this works.

+2 votes
in Sequences by

I realized that f in partition-by is called twice for the same element when a split occurs:

(partition-by
 #(do (println %)
   (rand))
 [1 2 3])

1
2
2
3
3
((1) (2) (3))

See how every time we split, so at element 2 and 3, how they get printed twice.

Is this a bug? Should it be doing this?

I noticed that because I had a case where I needed to count the occurrence of an element to know when to split and this was messing up my count.

Edit: It seems the transducer version doesn't have this behavior

(into []
 (partition-by
  #(do (println %)
    (rand)))
 [1 2 3])

1
2
3
[[1] [2] [3]]

1 Answer

+1 vote
by
edited by
([f coll]
 (lazy-seq
  (when-let [s (seq coll)]
    (let [fst (first s)
          fv (f fst)
          run (cons fst (take-while #(= fv (f %)) (next s)))]
      (cons run (partition-by f (lazy-seq (drop (count run) s))))))))

Looks like f gets called 1x when binding fv. Since the take-while portion doesn't yield any similar keys, the value of run is '(1). However, f "was" invoked on the first value of the rest of s, so 2 gets printed 1 time. When partition-by pseudo recurses during the lazy-seq construction, the first arg of the new partition is now 2. fv is determined by invoking f on 2, which prints two, etc. The process repeats.

The transducer variant never passes the head of the next partition recursively, so it only ever calls f once per element of the original input.

Note: we can derive a 1-time partition function variant without state like so:

user> (defn partition-by-once [f coll]
        (->> coll
             (map (fn [x] [(f x) x]))
             (partition-by first)
             (map #(map second %))))
#'user/partition-by-once
user> (partition-by-once #(do (println %)
                               (rand)) [1 2 3])
1
2
3
((1) (2) (3))

We can do the volatile trick that the transducer uses to get an implementation too:

    (defn partition-by-once-statefully
      [f coll]
      (let [pv   (volatile! ::none)
            test (fn [v] (vreset! pv (f v)))
            aux  (fn aux
                   ([coll]
                    (lazy-seq
                     (when-let [s (seq coll)]
                       (let [fst (first coll)
                             fv  @pv
                             run (cons fst (take-while #(= fv (test %)) (next s)))]
                         (cons run (aux (lazy-seq (drop (count run) s))))))))
                  ([fst remaining]
                   (if-let [s (seq remaining)]
                     (let [fv  (test fst)
                           run (cons fst (take-while #(= fv (test %)) s))]
                       (cons run (aux (lazy-seq (drop (dec (count run)) s)))))
                     `((~fst)))))]
        (when-let [s (seq coll)]
          (aux  (first coll) (next coll)))))

    user> (doall (partition-by-once-vol (fn [v] (println v) (odd? v)) [1 2 3 5 7 8 10 11]))
    1
    2
    3
    5
    7
    8
    10
    11
    ((1) (2) (3 5 7) (8 10) (11))
...