Welcome! Please see the About page for a little more info on how this works.

+8 votes
in Collections by

It would be nice if group-by let users control the aggregated collection type and manipulate values before aggregation. It's a common scenarios when grouping a collection of maps according to one key, and maybe aggregating, even numerically according to another.

group-by could be generalized like so:

(defn group-by
  "Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll."
  {:added "1.2"
   :static true}
  ([kf coll]
   (group-by kf [] coll))
  ([kf init coll]
   (group-by kf identity init coll))
  ([kf vf init coll]
   (group-by kf vf conj init coll))
  ([kf vf rf init coll]
     (fn [ret x]
       (let [k (kf x)]
         (assoc! ret k (rf (get ret k init) (vf x)))))
     (transient {}) coll))))

1 Answer

+1 vote

I really like how group-by can be decomposed using xforms:

(defn my-group-by [kfn coll]
  (x/into {}
    (x/into []))
  1. Grouping: x/by-key with kfn is responsible for separating the stream of values
  2. Internal aggregation: it takes a transducer to aggregate values in groups, (x/into []) matches the core impl
  3. External aggregation: it returns a transducer so the caller can decide how to to the outside aggregation, (x/into {} ,,, coll) matches the core impl

I'd love to see improvements to core (including group-by) go in the direction of supporting transducible processes.

This isn't wrong, but xforms requires a relatively big change, as opposed to generalizing the existing implementation, just make every value a parameter.