Welcome! Please see the About page for a little more info on how this works.

+1 vote
in Clojure by
closed by

'zipmap constructs a map without transients, where transients could improve performance.

Approach: Use a transient map internally, along with iterators for the keys and values. A persistent map is returned as before. The definition is also moved so that it resides below that of #'transient.

Performance:

(def xs (range 16384))
(def ys (range 16))

| expression | 1.7.0-beta3 | +patch | |
| :-- | :-- | :-- | :-- | :-- |
| (zipmap xs xs) | 4.50 ms | 2.12 ms | large map
| (zipmap ys ys) | 2.75 us | 2.07 us | small map

Patch: CLJ-1005-zipmap-iterators.patch

Screened by: Alex Miller

closed with the note: Fixed in 1.10.2-rc1

13 Answers

0 votes
by

Comment made by: aaron

Why is the old implementation left and commented out? If we are going to move to a new implementation, the old one should be removed.

0 votes
by

Comment made by: michalmarczyk

As mentioned in the ticket description, the previously attached patch follows the pattern of into whose non-transient-enabled definition is left in core.clj with a #_ in front -- I wasn't sure if that's something desirable in all cases.

Here's a new patch with the old impl removed.

0 votes
by

Comment made by: jafingerhut

Thanks for the updated patch, Michal. Sorry to raise such a minor issue, but would you mind using a different name for the updated patch? I know JIRA can handle multiple attached files with the same name, but my prescreening code isn't quite that talented yet, and it can lead to confusion when discussing patches.

0 votes
by

Comment made by: michalmarczyk

Thanks for the heads-up, Andy! I've reattached the new patch under a new name.

0 votes
by

Comment made by: jafingerhut

Presumptuously changing Approval from Incomplete back to None after the Michal's updated patch was added, addressing the reason the ticket was marked incomplete.

0 votes
by

Comment made by: aaron

The patch looks good and applies cleanly. Are there additional tests that we should run to verify that this is providing the improvement we think it is. Also, is there a discussion somewhere that started this ticket? There isn't a lot of context here.

0 votes
by

Comment made by: michalmarczyk

Hi Aaron,

Thanks for looking into this!

From what I've been able to observe, this change hugely improves {{zipmap}} times for large maps. For small maps, there is a small improvement. Here are two basic Criterium benchmarks ({{transient-zipmap}} defined at the REPL as in the patch):

`
;;; large map
user=> (def xs (range 16384))

'user/xs

user=> (last xs)
16383
user=> (c/bench (zipmap xs xs))
Evaluation count : 13920 in 60 samples of 232 calls.

         Execution time mean : 4.329635 ms
Execution time std-deviation : 77.791989 us

Execution time lower quantile : 4.215050 ms ( 2.5%)
Execution time upper quantile : 4.494120 ms (97.5%)
nil
user=> (c/bench (transient-zipmap xs xs))
Evaluation count : 21180 in 60 samples of 353 calls.

         Execution time mean : 2.818339 ms
Execution time std-deviation : 110.751493 us

Execution time lower quantile : 2.618971 ms ( 2.5%)
Execution time upper quantile : 3.025812 ms (97.5%)

Found 2 outliers in 60 samples (3.3333 %)

low-severe	 2 (3.3333 %)

Variance from outliers : 25.4675 % Variance is moderately inflated by outliers
nil

;;; small map
user=> (def ys (range 16))

'user/ys

user=> (last ys)
15
user=> (c/bench (zipmap ys ys))
Evaluation count : 16639020 in 60 samples of 277317 calls.

         Execution time mean : 3.803683 us
Execution time std-deviation : 88.431220 ns

Execution time lower quantile : 3.638146 us ( 2.5%)
Execution time upper quantile : 3.935160 us (97.5%)
nil
user=> (c/bench (transient-zipmap ys ys))
Evaluation count : 18536880 in 60 samples of 308948 calls.

         Execution time mean : 3.412992 us
Execution time std-deviation : 81.338284 ns

Execution time lower quantile : 3.303888 us ( 2.5%)
Execution time upper quantile : 3.545549 us (97.5%)
nil
`

Clearly the semantics are preserved provided transients satisfy their contract.

I think I might not have started a ggroup thread for this, sorry.

0 votes
by

Comment made by: jafingerhut

Patch 0001-Use-transient-map-in-zipmap.2.patch dated Aug 15 2012 does not apply cleanly to latest master after some commits were made to Clojure on Sep 3 2014.

I have not checked whether this patch is straightforward to update. See the section "Updating stale patches" at http://dev.clojure.org/display/community/Developing Patches for suggestions on how to update patches.

0 votes
by
_Comment made by: michalmarczyk_

Thanks, Andy. It was straightforward to update – an automatic rebase. Here's the updated patch.
0 votes
by

Comment made by: gshayban

New patch using clojure.lang.RT/iter, criterium shows >30% more perf in the best case. Less alloc probably but I didn't measure. CLJ-1499 (better iterators) is related

0 votes
by

Comment made by: justinspedding

4 years later, this zipmap implementation in the core namespace is still not using a transient map internally. Is there a reason why this was never applied?

0 votes
by
_Comment made by: alexmiller_

Multiple approaches proposed here, no consensus approach determined yet. Needs some focus time to narrow that down, and hasn’t been a priority yet.
0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1005 (reported by michalmarczyk)
...