Welcome! Please see the About page for a little more info on how this works.

–1 vote
in Collections by

As discussed on the mailing list (link: 1), this patch has two unrolled variants of vectors and maps, with special inner classes for each cardinality. Currently both grow to six elements before spilling over into the general versions of the data structures, which is based on rough testing but can be easily changed. At Rich's request, I haven't included any integration into the rest of the code, and there are top-level static create() methods for each.

The sole reason for this patch is performance, both in terms of creating data structures and performing operations on them. This can be seen as a more verbose version of the trick currently played with PersistentArrayMap spilling over into PersistentHashMap. Based on the benchmarks, which can be run by cloning cambrian-collections (link: 2) and running 'lein test :benchmark', this should supplant PersistentArrayMap. Performance is at least on par with PAM, and often much faster. Especially noteworthy is the creation time, which is 5x faster for maps of all sizes (lein test :only cambrian-collections.map-test/benchmark-construction), and on par for 3-vectors, but 20x faster for 5-vectors. There are similar benefits for hash and equality calculations, as well as calls to reduce().

This is a big patch (over 5k lines), and will be kind of a pain to review. My assumption of correctness is based on the use of collection-check, and the fact that the underlying approach is very simple. I'm happy to provide a high-level description of the approach taken, though, if that will help the review process.

I'm hoping to get this into 1.7, so please let me know if there's anything I can do to help accomplish that.

(link: 1) https://groups.google.com/forum/#!topic/clojure-dev/pDhYoELjrcs
(link: 2) https://github.com/ztellman/cambrian-collections

Patch: unrolled-vector-2.patch

Screener Notes: The approach is clear and understandable. Given the volume of generated code, I believe that best way to improve confidence in this code is to get people using it asap, and add collection-test (link: 3) to the Clojure test suite. I would also like to get the generator (link: 4) included in the Clojure repo. We don't need to necessarily automate running it, but would be nice to have it nearby if we want to tweak something later.

(link: 3) https://github.com/ztellman/collection-check/blob/master/src/collection_check.clj
(link: 4) https://github.com/ztellman/cambrian-collections/blob/master/generate/cambrian_collections/vector.clj

43 Answers

0 votes
by
_Comment made by: hypirion_

It should probably be noted that {{core.rrb-vector}} will break for small vectors by this patch, as it peeks into the underlying structure. This will also break other libraries which peeks into the vector implementation internals, although I'm not aware of any other – certainly not any other contrib library.

Also, two comments on {{unrolled-vector.patch}}:

{{private *transient* boolean edit = true;}}
in the Transient class should probably be
{{private *volatile* boolean edit = true;}}
as transient means something entirely different in Java.

{{conj}} in the {{Transient}} implementation _could_ invalidate itself without any problems ({{edit = false;}}) if it is converted into a TransientVector (i.e. spills over) – unless it has a notable overhead. The invalidation can prevent some subtle bugs related to erroneous transient usage.
0 votes
by

Comment made by: alexmiller

Jean - understanding the scope of the impact will certainly be part of the integration process for this patch. I appreciate the heads-up. While we try to minimize breakage for things like this, it may be unavoidable for libraries that rely on implementation internals.

0 votes
by
_Comment made by: michalmarczyk_

I'll add support for unrolled vectors to {{core.rrb-vector}} the moment they land on master. :-) (Probably with some conditional compilation so as not to break compatibility with earlier versions of Clojure – we'll see when the time comes.)
0 votes
by

Comment made by: michalmarczyk

I should say that it'd be possible to add generic support for any "vector lookalikes" by pouring them into regular vectors in linear time. At first glance it seems to me that that'd be out of line with the basic promise of the library, but I'll give it some more thought before the changes actually land.

0 votes
by

Comment made by: ztellman

Somewhat predictably, the day after I cut the previous patch, someone found an issue (link: 1). In short, my use of the ArrayChunk wrapper applied the offset twice.

This was not caught by collection-check, which has been updated to catch this particular failure. It was, however, uncovered by Michael Blume's attempts to merge the change into Clojure, which tripped a bunch of alarms in Clojure's test suite. My own attempt to do the same to "prove" that it worked was before I added in the chunked seq functionality, hence this issue persisting until now.

As always, there may be more issues lurking. I hope we can get as many eyeballs on the code between now and 1.8 as possible.

(link: 1) https://github.com/ztellman/cambrian-collections/commit/2e70bbd14640b312db77590d8224e6ed0f535b43
(link: 2) https://github.com/MichaelBlume/clojure/tree/test-vector

0 votes
by

Comment made by: ztellman

As a companion to the performance analysis in the unrolled map issue, I've run the benchmarks and posted the results at https://gist.github.com/ztellman/10e8959501fb666dc35e. Some notable results:

0 votes
by

Comment made by: alexmiller

Stu: I do not think this patch should be marked "screened" until the actual integration and build work (if the generator is integrated) has been completed.

0 votes
by

Comment made by: alexmiller

FYI, we have "reset" all big features for 1.8 for the moment (except the socket repl work). We may still include it - that determination will be made later.

0 votes
by

Comment made by: ztellman

Okay, any idea when the determination will be made? I was excited that we seemed to be finally moving forward on this.

0 votes
by

Comment made by: alexmiller

No, but it is now on my work list.

0 votes
by

Comment made by: richhickey

I wonder if all of the overriding of APersistentVector yields important benefits - e.g. iterator, hashcode etc.

0 votes
by

Comment made by: ztellman

In the case of hashcode, definitely: https://gist.github.com/ztellman/10e8959501fb666dc35e#file-gistfile1-txt-L1013-L1076. This was actually one of the original things I wanted to speed up.

In the case of the iterator, probably not. I'd be fine removing that.

0 votes
by

Comment made by: ztellman

So am I to infer from https://github.com/clojure/clojure/commit/36d665793b43f62cfd22354aced4c6892088abd6 that this issue is defunct? If so, I think there's a lot of improvements being left on the table for no particular reason.

0 votes
by

Comment made by: richhickey

Yes, that commit covers this functionality. It takes a different approach from the patch in building up from a small core, and maximizing improvements to the bases rather than having a lot of redundant definitions per class. That also allowed for immediate integration without as much concern for correctness, as there is little new code. It also emphasizes the use case for tuples, e.g. small vectors used as values that won't be changed, thus de-emphasizing the 'mutable' functions. I disagree that many necessary improvements are being left out. The patch 'optimized' many things that don't matter. Further, there are not big improvements to the pervasive inlining. In addition, the commit includes the integration work at a fraction of the size of the patch. In all, it would have taken much more back and forth to get the patch to conform with this approach than to just get it all done, but I appreciate the inspiration and instigation - thanks!

0 votes
by

Comment made by: richhickey

That said, this commit need not be the last word - it can serve as a baseline for further optimization. But I'd rather that be driven by need. Clojure could become 10x as large optimizing things that don't matter.

...