Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3])) false user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3]) true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.

Link to a discussion of this topic in the Clojure group: https://groups.google.com/forum/#!topic/clojure/dQhdwZsyIEw

43 Answers

0 votes
by
_Comment made by: michalmarczyk_

For completeness, branching on {{Map}}, {{Set}} etc. directly in {{hasheq}}, as with Jozef's original patch, results in the following timings in the microbenchmark introduced in my previous comment:

|xor|315.866626 ns|
|juhm|18.520133 µs|
0 votes
by
_Comment made by: michalmarczyk_

New patch (0006) that leaves out the {{Map.Entry}} check; instead, two methods are introduced in the {{Murmur3}} class to handle j.u.maps.

Java map entries aren't really integrated into Clojure -- you can't use them like vectors, can't call {{seq}} on them etc. -- so I don't think they need to match Clojure map entries in hasheq as long as j.u.maps do.

Timings:

|xor|233.341689 ns|
|juhm|9.104637 µs|
0 votes
by
_Comment made by: michalmarczyk_

Checking for Map/Iterable in-line doesn't seem to affect xor benchmark results very much, but makes juhm hashing quicker. This is rather surprising to me. In any case, here's a new patch (0007) and the timings:

|xor|233.062337 ns|
|juhm|8.629149 µs|
0 votes
by

Comment made by: alexmiller

What are equivalent timings without the patch?

0 votes
by
_Comment made by: michalmarczyk_

They're listed in the table in the comment introducing the benchmark -- 148.128748 ns for xor, 1.701640 µs for juhm.
0 votes
by

Comment made by: alexmiller

What if we override hasheq for different types instead of using instanceof?

0 votes
by

Comment made by: michalmarczyk

Overloaded methods are resolved statically, so there's no avoiding testing for type in the {{Object}} overload.

A more specific overload could be used to speed up hashing for its parameter type given a type hint or for literals, since the compiler would emit calls to that overload given appropriate compile-time information. There wouldn't be any speed-up in "implicit" hashing during hash map / set ops, however.

0 votes
by

Comment made by: desk@danielcompton.net

This hit me when upgrading Factual/skuld from 1.5.1 to 1.6. clojure.data.fressian serialises c.l.PersistentHashSet sets into java.util.HashSet. This breaks equality checking in https://github.com/Factual/skuld/blob/b720feb142e6d274e85be208dc1d6d8634801719/test/skuld/net_test.clj#L8-L29 as we are comparing a set of maps where the original set contains a PersistentSet and the serialised and deserialised one contains a HashSet.

0 votes
by

Comment made by: desk@danielcompton.net

This has come up again for me, details are in http://dev.clojure.org/jira/browse/DFRS-7

0 votes
by

Comment made by: mpenet

This bit me again today as well (wasted a lot of time before figuring this one out). Any chance we'll have a patch for 1.7?

0 votes
by

Comment made by: alexmiller

afaik, we are still in search of an approach with tolerable performance impacts before this can be considered.

0 votes
by

Comment made by: alexmiller

While we are not opposed to addressing this ticket, we are not interested in addressing it at the expense of performance in comparisons that we care about more, and none of the current proposals meets that bar. For now, I am moving this to Backlog but would pull it out if a solution presents itself.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1372 (reported by wagjo)
...