Welcome! Please see the About page for a little more info on how this works.

+19 votes
in Clojure by
closed by

Many people have been writing a function to map values in HashMap:

Proposal: Add map-keys and map-values which: maps keys in HashMap, and map values in HashMap. They return HashMap as a result.

Workaround: Using function reduce-kv or ordinary map and into is a common solution, but they are confusing and types change, which makes it tricky and tedious.

Discussions: https://groups.google.com/forum/#!topic/clojure-dev/kkPYIl5qj0o

closed with the note: Clojure 1.11.0-alpha2 adds support for `update-keys` and `update-vals` - see CLJ-1959 and CLJ-2651 for more details

13 Answers

0 votes
by

Comment made by: delihiros

code and test for map-keys and map-vals

0 votes
by

Comment made by: bronsa

I propose those functions being called update-vals and update-keys rather than map-vals and map-keys

0 votes
by

Comment made by: alexmiller

It's not worth bike-shedding names on this - Rich will have his own opinion regardless.

On the patch:
- remove the :static metadata, that's not used anymore
- needs docstrings, which should be written in the style of other Clojure docstrings. map is probably a good place to draw from.
- rather than declare into, defer the definition of these till whatever it needs has been defined. There is no reason to add more declares for this.

There are other potential implementations - these should be implemented and compared for performance across a range of input sizes. In addition to the current approach, I would investigate:

  • reduce-kv with construction into a transient map. This allows the map to reduce itself (no seq caching needed) and avoid creating entries only to tear them apart again.
  • transducers with (into {} (map ...) m)

Also should consider
- whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure)
- if building the map, how to construct the map entries (vector vs creating a mapentry object directly)
- in map-keys, is there any open question when map generates new overlapping keys?
- are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are)

0 votes
by

Comment made by: delihiros

Thanks for comments:)

I propose those functions being called update-vals and update-keys rather than map-vals and map-keys
Maybe. But I name it map-** just for now, we can choose it later :)

about potential implementations:
I have tried several implementations, and seems to be the current implementation is the fastest.
You can see it here: https://github.com/delihiros/performance

about considerings:
> whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure)
> are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are)
> if building the map, how to construct the map entries (vector vs creating a mapentry object directly)
I'll check which them as soon as possible. I haven't done it yet.

in map-keys, is there any open question when map generates new overlapping keys?
I believe it should be overwritten by latter applied key and value.

0 votes
by

Comment made by: nathanmarz

I've done quite a bit of investigation into this through building Specter. Here are some benchmarks of numerous ways of doing map-vals, including using Specter.

Code: https://github.com/nathanmarz/specter/blob/4778500e0370fb211f47ebf4d69ca64366117b6c/scripts/benchmarks.clj#L87
Results: https://gist.github.com/nathanmarz/bf571c9ed86bfad09816e17b9b6e59e3

A few comments:
- Implementations that build and tear apart MapEntry's perform much worse.
- Transients should be used for large maps but not for small ones.
- This benchmark shows that the property of maintaining the type of the map in the output can be achieved without sacrificing performance (the test cases using Specter or "empty" have this property).

0 votes
by

Comment made by: delihiros

I've modified the implementation. It should be faster than before.

0 votes
by

Comment made by: steveminer@gmail.com

Implementations that call reduce-kv are not lazy so the documentation should be clarified in the proposed patch (map-mapper-v3.patch). Also, it's probably better to say "map" (as the noun) rather than to specify a particular concrete type "hash map".

0 votes
by

Comment made by: bronsa

map->map operations can't be lazy either way. Even if one implementation used lazy operations to iterate over the original map, the into {} would realize it later.

0 votes
by

Comment made by: nathanmarz

-1 to this. Clojure aims to be a small core, pushing additional functionality into libraries. The problem space of compound transformations, of which this functionality is a small piece, is already thoroughly solved by Specter. Specter's MAP-VALS and MAP-KEYS navigators additionally support removal of key/value pairs during transformation by transforming to special NONE value. This expands the utility greatly.

Also worth noting is a fast implementation requires a totally different approach depending on the type of the map. reduce-kv with transients is optimal for hash maps, but for array maps using lower level facilities provide ~60% speed boost. See Specter's implementation here https://github.com/nathanmarz/specter/blob/1.0.4/src/clj/com/rpl/specter/navs.cljc#L243

0 votes
by

Comment made by: gshayban

Nathan you're making a strawman re: compound transformations. This isn't a request for a function with filtering knobs or conditional behavior (which Clojure has historically opposed). There are other valid approaches than Specter's.

Re fast implementation: Not every function has to strive for the most performant implementation, esp at the cost of branching and general complexity. A cost model for performance has to take into account complexity.

This ticket is a request for a convenience that is repeated in many codebases.

Do we want to preserve metadata? Many map operations do.
Do we want to assume IEditableCollection?

0 votes
by

Comment made by: nathanmarz

Performance is incredibly important for general data structure manipulation functions like this. Manipulating data structures is one of the most basic things you do in a language, and it's done all the time in performance sensitive code. Having to abandon the "nice" functions for ugly specialized code in performance sensitive code is a failure of the language.

map-vals/map-keys are part of a rich problem space of which myself and the users of Specter have learned a lot the past few years. Clojure barely touches this problem space, especially when it comes to nested or recursive data structures. Adding functions to Clojure that are significantly inferior in both semantics and performance to what already exists in an external library seems pretty silly to me. Just my two cents.

0 votes
by

Comment made by: delihiros

I agree with Nathan Martz that the performance is very important, but I still have a strong opinion that this function should be somehow imported to the core part of the language.
People use this transformation pretty often and if there is a fast implementation in the core it will be a great benefit to all of us.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1959 (reported by delihiros)
...