Welcome! Please see the About page for a little more info on how this works.

+6 votes
in Clojure by
recategorized by

TL;DR

update-vals and update-keys are mainly supposed to operate on maps of homogenous data that are keyed on some identifier.
https://clojure.atlassian.net/browse/CLJ-2651
https://clojure.atlassian.net/browse/CLJ-1959

But there is no function in core that produces such a map from a sequence of homogenous data. I guess there are a lot of ways to achieve this, for example

(into {} (map (juxt f identity)) coll) ; credit Sean Corfield
(persistent! (reduce #(assoc! %1 (f %2) %2) (transient {}) coll)) ; from Medley
(update-vals first (group-by f coll)) 

But I have no idea which is most performant. So, I wonder if there is a place for a key-by function in core that produces such a map with good performance?


Clojure 1.11 introduced update-keys and update-vals, and I (and I presume others) started using them instead of our handwritten map-vals and the likes.

However, I am still uncertain on the best way to create a map that update-keys and update-vals operate on. Typically, I will have a sequence of homogenous maps retrieved from a database, like

[{:id 1 :name "brandon"}
 {:id 2 :name "brenda"}
 {:id 3 :name "kelly"}]

and a (key-by :id coll) or (index-by :id coll) function turns it into

{1 {:id 1 :name "brandon"}
 2 {:id 2 :name "brenda"}
 3 {:id 3 :name "kelly"}}

But how is this best achieved? update-vals allows for

(defn key-by 
  [f coll]
  (update-vals first (group-by f coll)))

But these collections are often large and one wants good perfomance. In Medley, index-by is written as such

(defn index-by
  [f coll]
  (persistent! (reduce #(assoc! %1 (f %2) %2) (transient {}) coll)))

Which looks faster and is something that I could never have come up with. So I can't help but wonder if key-by (that name sounds better to me) became a missing piece in Clojure core when update-keys and update-vals were introduced?

by
There's a generalization of the requested function that I haven't seen anyone mention, and would cover both `group-by` and `index-by`:

```
(defn group-with [kf vf coll] ...)
;; Where kf is itm->k and vf is v->itm->v'

(def index-by #(group-with %1 (partial apply second) %2))

(def group-by #(group-with %1 (fnil conj []) %2))
```

`group-with` would also cover a broader set of use cases that currently require using `reduce`.

EDIT: sorry, I'm new to ask clojure, I meant to post this as an answer.

2 Answers

0 votes
by
selected by
 
Best answer

Created jira request at https://clojure.atlassian.net/browse/CLJ-2738

I agree with Sean elsewhere here that I don't think this has anything to do with update-keys or update-vals, or homongenous value sets.

by
Perhaps it’s only in my mind that update-vals and update-keys suggest that index-by should exist. Nevertheless, I’m very happy to see this in JIRA, thank you!
by
To kill two birds with one stone, `index-by` could be implemented with the more generic `group-by` proposed here:
https://ask.clojure.org/index.php/12319/can-group-by-be-generalized
0 votes
by

Your key-by is (into {} (map (juxt :id identity)) coll) (which uses transduce under the hood with transient/persistent!).

This seems orthogonal to what you would use update-keys and update-vals on (we use both of those at work in some places and we also have the equivalent of key-by in other places).

by
Yes, I have also used `(into {} (map (juxt :id identity)) coll)` for this purpose. But when 1.11 was released, I saw that I could use `(update-vals first (group-by :id coll))` and thought it was a bit neater.

Re: orthogonality I think my question wasn't quite clear. To my understanding, `key-by` produces the type of map on which `update-vals` and `update-keys` are supposed to operate? My question was simply if it is crazy to think that a `key-by` function that produces that type of map also belongs in core, since `update-vals` and `update-keys` are in core. It just seems symmetrical to have all three functions :-)
by
update-vals and update-keys can operate on any hash map -- they are generic.
by
Yes of course. But the expected use case for at least `update-vals` seems to be maps of homogenous types

https://clojure.atlassian.net/browse/CLJ-2651
"The maps requiring application of update-vals almost always have values of homogeneous types."

And this of course not a request - just a question whether it is an oversight not to have a function that turns a collection of homogenous types into an indexed map of homogenous types. :-)
...