Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3])) false user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3]) true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.

Link to a discussion of this topic in the Clojure group: https://groups.google.com/forum/#!topic/clojure/dQhdwZsyIEw

43 Answers

0 votes
by

Comment made by: wagjo

Same problem for maps, so hasheq should have a special case for java.util.Map too.

0 votes
by

Comment made by: wagjo

Added patch with fix for j.u. Map, Set and List.

0 votes
by

Comment made by: jafingerhut

Add patch clj-1372-2.diff that is identical to Jozef Wagner's clj-1372.diff, except it also adds some new tests that fail without his changes, and pass with them.

0 votes
by

Comment made by: alexmiller

I think the contract on equiv/hasheq is more narrowly scoped than this and only applies if both collections are IPersistentCollection. In other words, I don't think this is wanted or required.

Note that the Java .equals/.hashCode contract is maintained here - these collections will compare as .equals() and do have the same .hashCode().

0 votes
by

Comment made by: wagjo

Without the patch the following statement is not valid: "If two objects are equal with c.c/=, than their hash returned by c.c/hash is the same number". We can say that this is valid only iff both objects are 'clojure' objects, but this goes against clojures interop principles (interop is easy, fast, no surprises).

0 votes
by

Comment made by: wagjo

Manifestation of this bug

user=> (assoc (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]) :bar) {[1 2 3] :bar, [1 2 3] :foo} user=> (get (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3])) nil

0 votes
by

Comment made by: alexmiller

I agree that would be a nice thing to say without qualification.

There is a real cost to adding more branches in hasheq - adding those collection checks affects every hasheq. Running a full Clojure build, I see the following set of classes with >100 occurences where this happens (note that exactly 0 of these are the Java collections - this case doesn't exist in the Clojure build itself):

clojure.lang.Var 107001502 java.lang.Class 2651389 java.lang.Character 2076322 java.util.UUID 435235 java.util.Date 430956 clojure.lang.Compiler$LocalBinding 116830 java.lang.Boolean 112361 java.util.regex.Pattern 325

We'd be adding 4 more instanceof checks in the path of every one of those hasheqs. This would also likely blow any JVM inlining.

Rich says "all bets should be off for hasheq/equiv of non-values" where Java collections obviously fall into the class of "non-values".

0 votes
by

Comment made by: jafingerhut

Would a doc patch be considered? Say one that modified the doc of clojure.core/hash to include a phrase indicating that it is only promised to be consistent with clojure.core/= for immutable values? It could even perhaps mention that Floats are out, too: see CLJ-1036

0 votes
by

Comment made by: alexmiller

I think it would be preferred to do any detailed docs about hash at http://clojure.org/data_structures rather than in the docstring. Although the docstring on hash probably could use an update and a pointer to the web site after the latest changes.

0 votes
by

Comment made by: wagjo

Neverthless it is a breaking change from 1.5, and it should be mentioned in changelog. What still bugs me is that c.c/= is supported in such cases but the c.c/hash is not. If supporting c.c/hash is expensive, isn't it better to drop support for c.c/= in such cases? It will eliminate surprises such as:

user=> (apply distinct? (hash-set [1 2 3] (java.util.Collections/unmodifiableList [1 2 3]))) false

0 votes
by

Comment made by: alexmiller

I'm not sure it's a "breaking" change if something not considered to be guaranteed changes. :) But I take your point.

I don't think it's feasible to drop = support for Clojure and Java collections - that seems important and useful. And if it were free to do so, I would like to be able to say without qualification that if equiv=true, then hasheq is the same.

It's unclear to me that the examples listed on this ticket are actually real problems people are likely to encounter. The main users of hasheq are hash map and hash set. So to manifest, you would need to be putting a mixture of Clojure and Java collections into one of those, in particular a mixture of collections that compare as equal.

Still thinking about it.

0 votes
by

Comment made by: wagjo

Sorry for spamming but there may be another option, to not fallback into hashCode in hasheq, but to instead throw in cases where hasheq is requested for non-values. This will lead to a cleaner separation of hash types. Of course it will prevent putting non-values into hash-set.

0 votes
by

Comment made by: alexmiller

There is no simple check for "valueness" though?

0 votes
by

Comment made by: jafingerhut

An idea, for what it might be worth: Add one test for instance of java.util.Collection in Util.hasheq method instead of 3 separate tests for Set, List, and Map. It doesn't cover Map.Entry.

0 votes
by

Comment made by: alexmiller

Map doesn't extend Collection either.

...