Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3])) false user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3]) true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.

Link to a discussion of this topic in the Clojure group: https://groups.google.com/forum/#!topic/clojure/dQhdwZsyIEw

43 Answers

0 votes
by

Comment made by: stu

I think this needs more consideration and should not hold up 1.6.

0 votes
by

Comment made by: jafingerhut

Both patches clj-1372.diff and clj-1372-2.diff fail to apply cleanly as of latest Clojure master on Mar 20 2014. They did apply cleanly before the Mar 19 2014 commit, I believe, and the only issue appears to be a changed line of diff context. Given the discussion about whether such a change is desired, it sounds like more thought is needed before deciding what change should be made, if any.

0 votes
by

Comment made by: mikera

This is a pretty bad defect. It absolutely needs to be fixed. It's not really about whether using a mix of Clojure and Java collections is a likely use case or not (it probably isn't...), it's about providing consistent guarantees that people can rely upon.

For example, now I'm really unsure about whether some of the library functions I have that use sets or maps are broken or not. I'd be particularly worried about anything that implements object caches / memoisation / interning based on hashed values. Such code may now have some really nasty subtle defects.

Since they are library functions, I can't guarantee what kind of objects are passed in so the code has to work with all possible inputs (either that or I need to write a clear docstring and throw an exception if the input is not supported).

0 votes
by
_Comment made by: michalmarczyk_

This patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) makes hasheq consistent with = for java.util.{List,Map,Map.Entry,Set}. Additionally it extends the special treatment of String (return hasheq of hashCode) to all types not otherwise handled (see below for a comment on this).

It is also available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-2

An earlier version is available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq

If I understand correctly, what needs to be benchmarked is primarily the "dispatch time" for clojure.lang.Util/hasheq given a Clojure type. So, I ran a Criterium benchmark repeatedly hashing the same persistent hash map, on the theory that this will measure just the dispatch time on IHashEq instances. I then ran a separate benchmark hashing a PHM, a string and a long and adding up the results with unchecked-add. Hopefully this is a good start; I've no doubt additional benchmarks would be useful.

The results are somewhat surprising to me: hasheq on PHM is actually slightly faster in this benchmark on my build than on 1.6.0; the "add three hasheqs" benchmark is slightly faster on 1.6.0.


;;; 1.6.0

;;; NB. j.u.HM benchmark irrelevant
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.24405836928592 % of runtime
Evaluation count : 5549560980 in 60 samples of 92492683 calls.
             Execution time mean : 9.229881 ns
    Execution time std-deviation : 0.156716 ns
   Execution time lower quantile : 8.985994 ns ( 2.5%)
   Execution time upper quantile : 9.574039 ns (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe     2 (3.3333 %)
 Variance from outliers : 6.2652 % Variance is slightly inflated by outliers
Evaluation count : 35647680 in 60 samples of 594128 calls.
             Execution time mean : 1.695145 µs
    Execution time std-deviation : 20.186554 ns
   Execution time lower quantile : 1.670049 µs ( 2.5%)
   Execution time upper quantile : 1.740329 µs (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe     2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.028614538339401 % of runtime
Evaluation count : 1029948300 in 60 samples of 17165805 calls.
             Execution time mean : 56.797488 ns
    Execution time std-deviation : 0.732221 ns
   Execution time lower quantile : 55.856731 ns ( 2.5%)
   Execution time upper quantile : 58.469940 ns (97.5%)
                   Overhead used : 1.836671 ns

Found 1 outliers in 60 samples (1.6667 %)
    low-severe     1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; patch applied

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5537698680 in 60 samples of 92294978 calls.
             Execution time mean : 8.973200 ns
    Execution time std-deviation : 0.157079 ns
   Execution time lower quantile : 8.733544 ns ( 2.5%)
   Execution time upper quantile : 9.289350 ns (97.5%)
                   Overhead used : 1.744772 ns
Evaluation count : 2481600 in 60 samples of 41360 calls.
             Execution time mean : 24.287800 µs
    Execution time std-deviation : 288.124326 ns
   Execution time lower quantile : 23.856445 µs ( 2.5%)
   Execution time upper quantile : 24.774097 µs (97.5%)
                   Overhead used : 1.744772 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.298136122909759 % of runtime
Evaluation count : 954751500 in 60 samples of 15912525 calls.
             Execution time mean : 61.681794 ns
    Execution time std-deviation : 0.712110 ns
   Execution time lower quantile : 60.622003 ns ( 2.5%)
   Execution time upper quantile : 62.904801 ns (97.5%)
                   Overhead used : 1.744772 ns

Found 1 outliers in 60 samples (1.6667 %)
    low-severe     1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil


As a side note, the earlier version of the patch available on the other branch doesn't have a separate branch for String. This made hasheq faster for objects implementing IHashEq, but slowed down the "three hashes" benchmark roughly by a factor of 2.
0 votes
by

Comment made by: alexmiller

Just for clarity, please refer to patches attached here by name so as time goes on we don't have to correlate attachment time with comment time.

I'm not particularly worried about the cost of things that implement IHashEq as they should be unaffected other than potential inlining issues. I am curious about the cost of hasheq for objects that fall through to the end of the cases and pay the cost for all of the checks. The list farther up in the comments is a good place to start - things like Class, Character, and Var (which could possibly be addressed in Var).

0 votes
by

Comment made by: michalmarczyk

Good point, I've edited the above comment to include the patch name.

Thanks for the benchmarking suggestions -- I'll post some new results in ~6 minutes.

0 votes
by

Comment made by: michalmarczyk

First, for completeness, here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-alternative.patch) which doesn't do the extra murmuring for types not otherwise handled. It's slower for the single PHM case; see below for details. Also, here's the branch on GitHub:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-3

As for the new results, the perf hit is quite large, I'm afraid:

`
;;; with patch (murmur hashCode for default version)
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.409118084170768 % of runtime
Evaluation count : 655363680 in 60 samples of 10922728 calls.

         Execution time mean : 96.459888 ns
Execution time std-deviation : 1.019817 ns

Execution time lower quantile : 95.079086 ns ( 2.5%)
Execution time upper quantile : 98.684168 ns (97.5%)

               Overhead used : 1.708347 ns

Evaluation count : 675919140 in 60 samples of 11265319 calls.

         Execution time mean : 88.965959 ns
Execution time std-deviation : 0.825226 ns

Execution time lower quantile : 87.817159 ns ( 2.5%)
Execution time upper quantile : 90.755688 ns (97.5%)

               Overhead used : 1.708347 ns

Evaluation count : 574987680 in 60 samples of 9583128 calls.

         Execution time mean : 103.881498 ns
Execution time std-deviation : 1.103615 ns

Execution time lower quantile : 102.257474 ns ( 2.5%)
Execution time upper quantile : 106.071144 ns (97.5%)

               Overhead used : 1.708347 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; 1.6.0
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.3353133083866688 % of runtime
Evaluation count : 1829305260 in 60 samples of 30488421 calls.

         Execution time mean : 34.205701 ns
Execution time std-deviation : 0.379106 ns

Execution time lower quantile : 33.680636 ns ( 2.5%)
Execution time upper quantile : 34.990138 ns (97.5%)

               Overhead used : 1.718257 ns

Found 2 outliers in 60 samples (3.3333 %)

low-severe	 1 (1.6667 %)
low-mild	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1858100340 in 60 samples of 30968339 calls.

         Execution time mean : 30.401309 ns
Execution time std-deviation : 0.213878 ns

Execution time lower quantile : 30.095976 ns ( 2.5%)
Execution time upper quantile : 30.871497 ns (97.5%)

               Overhead used : 1.718257 ns

Evaluation count : 1592932200 in 60 samples of 26548870 calls.

         Execution time mean : 36.292934 ns
Execution time std-deviation : 0.333512 ns

Execution time lower quantile : 35.795063 ns ( 2.5%)
Execution time upper quantile : 36.918183 ns (97.5%)

               Overhead used : 1.718257 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
`

One PHM and Class/Character/Var results with the new patch (no extra murmur step in the default case):

`
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.258952964663877 % of runtime
Evaluation count : 1007768460 in 60 samples of 16796141 calls.

         Execution time mean : 58.195608 ns
Execution time std-deviation : 0.482804 ns

Execution time lower quantile : 57.655857 ns ( 2.5%)
Execution time upper quantile : 59.154655 ns (97.5%)

               Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
Evaluation count : 647944080 in 60 samples of 10799068 calls.

         Execution time mean : 91.275863 ns
Execution time std-deviation : 0.659943 ns

Execution time lower quantile : 90.330980 ns ( 2.5%)
Execution time upper quantile : 92.711120 ns (97.5%)

               Overhead used : 1.567532 ns

Evaluation count : 699506160 in 60 samples of 11658436 calls.

         Execution time mean : 84.564131 ns
Execution time std-deviation : 0.517071 ns

Execution time lower quantile : 83.765607 ns ( 2.5%)
Execution time upper quantile : 85.569206 ns (97.5%)

               Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 594919980 in 60 samples of 9915333 calls.

         Execution time mean : 100.336792 ns
Execution time std-deviation : 0.811312 ns

Execution time lower quantile : 99.313490 ns ( 2.5%)
Execution time upper quantile : 102.167675 ns (97.5%)

               Overhead used : 1.567532 ns

Found 3 outliers in 60 samples (5.0000 %)

low-severe	 3 (5.0000 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
`

0 votes
by

Comment made by: michalmarczyk

Here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-substring.patch) that takes the outrageous approach of replacing the Iterable/Map/Entry test with a .startsWith("java.util.") on the class name. (I experimented with .getClass().getPackage(), but the performance of that was terrible.) The branch is here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-4

Hash perf on the "fall-through" cases with this patch seems to be very good:

`
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.31690036780011 % of runtime
Evaluation count : 1661453640 in 60 samples of 27690894 calls.

         Execution time mean : 35.099750 ns
Execution time std-deviation : 0.422800 ns

Execution time lower quantile : 34.454839 ns ( 2.5%)
Execution time upper quantile : 35.953584 ns (97.5%)

               Overhead used : 1.556642 ns

Evaluation count : 1630167600 in 60 samples of 27169460 calls.

         Execution time mean : 35.487409 ns
Execution time std-deviation : 0.309872 ns

Execution time lower quantile : 35.083030 ns ( 2.5%)
Execution time upper quantile : 36.190015 ns (97.5%)

               Overhead used : 1.556642 ns

Found 4 outliers in 60 samples (6.6667 %)

low-severe	 3 (5.0000 %)
low-mild	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1440434700 in 60 samples of 24007245 calls.

         Execution time mean : 40.894457 ns
Execution time std-deviation : 0.529510 ns

Execution time lower quantile : 40.055991 ns ( 2.5%)
Execution time upper quantile : 41.990985 ns (97.5%)

               Overhead used : 1.556642 ns

nil
`

0 votes
by
_Comment made by: michalmarczyk_

The new patch (...-substring.patch) returns hashCode for java.util.** classes other than List, Map, Map.Entry and Set, of course, so no behaviour change there.

Here are the benchmarks for repeated PHM lookups (slightly slower than 1.6.0 apparently, though within 1 ns) and the "add three hasheqs" benchmark (66 ns with patch vs. 57 ns without):


user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5183841240 in 60 samples of 86397354 calls.
             Execution time mean : 10.076893 ns
    Execution time std-deviation : 0.182592 ns
   Execution time lower quantile : 9.838456 ns ( 2.5%)
   Execution time upper quantile : 10.481086 ns (97.5%)
                   Overhead used : 1.565749 ns
Evaluation count : 3090420 in 60 samples of 51507 calls.
             Execution time mean : 19.596627 µs
    Execution time std-deviation : 224.380257 ns
   Execution time lower quantile : 19.288347 µs ( 2.5%)
   Execution time upper quantile : 20.085620 µs (97.5%)
                   Overhead used : 1.565749 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.418253438197936 % of runtime
Evaluation count : 879210900 in 60 samples of 14653515 calls.
             Execution time mean : 66.939309 ns
    Execution time std-deviation : 0.747984 ns
   Execution time lower quantile : 65.667310 ns ( 2.5%)
   Execution time upper quantile : 68.155046 ns (97.5%)
                   Overhead used : 1.724002 ns
nil


It is important to note that I have obtained the no-patch result for the "three hasheqs" benchmarks on a fresh JVM when benchmarking 1.6.0, so that's also how I repeated the benchmark with the patch applied. Hashing many different types changes the results noticeably -- presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?
0 votes
by

Comment made by: michalmarczyk

Here's a new patch (0005-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) that introduces a new isAlien static method that checks for instanceof Map/Map.Entry/Iterable and uses this method to test for "alien collection".

Initial benchmarking results are promising:

`
;;; "fall-through" benchmark
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.258979068087473 % of runtime
Evaluation count : 1598432100 in 60 samples of 26640535 calls.

         Execution time mean : 36.358882 ns
Execution time std-deviation : 0.566925 ns

Execution time lower quantile : 35.718889 ns ( 2.5%)
Execution time upper quantile : 37.414722 ns (97.5%)

               Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1626362460 in 60 samples of 27106041 calls.

         Execution time mean : 35.426993 ns
Execution time std-deviation : 0.294517 ns

Execution time lower quantile : 35.047064 ns ( 2.5%)
Execution time upper quantile : 36.058667 ns (97.5%)

               Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)

low-severe	 1 (1.6667 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1461423180 in 60 samples of 24357053 calls.

         Execution time mean : 39.541873 ns
Execution time std-deviation : 0.423707 ns

Execution time lower quantile : 38.943560 ns ( 2.5%)
Execution time upper quantile : 40.499433 ns (97.5%)

               Overhead used : 1.823120 ns

Found 2 outliers in 60 samples (3.3333 %)

low-severe	 2 (3.3333 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; "three hasheqs" benchmark
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.5536755331464491 % of runtime
Evaluation count : 820376460 in 60 samples of 13672941 calls.

         Execution time mean : 71.999365 ns
Execution time std-deviation : 0.746588 ns

Execution time lower quantile : 70.869739 ns ( 2.5%)
Execution time upper quantile : 73.565908 ns (97.5%)

               Overhead used : 1.738155 ns

Found 2 outliers in 60 samples (3.3333 %)

low-severe	 2 (3.3333 %)

Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
`

0 votes
by
_Comment made by: michalmarczyk_

Ah, I left out the repeated phm hasheq lookup + hasheq of a java.util.HashMap instance pair of benchmarks from the above -- here it is for completeness (no surprises though):


user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.260853406580491 % of runtime
Evaluation count : 5369135760 in 60 samples of 89485596 calls.
             Execution time mean : 10.380464 ns
    Execution time std-deviation : 3.407284 ns
   Execution time lower quantile : 9.510624 ns ( 2.5%)
   Execution time upper quantile : 11.461485 ns (97.5%)
                   Overhead used : 1.566301 ns

Found 5 outliers in 60 samples (8.3333 %)
    low-severe     3 (5.0000 %)
    low-mild     2 (3.3333 %)
 Variance from outliers : 96.4408 % Variance is severely inflated by outliers
Evaluation count : 3078180 in 60 samples of 51303 calls.
             Execution time mean : 19.717981 µs
    Execution time std-deviation : 209.896848 ns
   Execution time lower quantile : 19.401811 µs ( 2.5%)
   Execution time upper quantile : 20.180163 µs (97.5%)
                   Overhead used : 1.566301 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe     2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
0 votes
by

Comment made by: alexmiller

Please don't submit any patches that change hashcode for anything other than making Java collections match Clojure collections - any other change is out of scope of this ticket.

In general, I would prefer just the execution time mean report for the moment rather than everything - the full criterium output makes these comments much harder to read and compare.

0 votes
by

Comment made by: alexmiller

Could I get a summary of approaches, and a timing of 1.6.0 vs each patch for a consistent set of tests - say time of hash for Long, PHM, juHM, Class, and the "three hasheqs" test?

0 votes
by
_Comment made by: richhickey_

"Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?"

Right - if your benchmarks do not treat this site as megamorphic you will get all sorts of distorted results.
0 votes
by
_Comment made by: michalmarczyk_

Ok, I have what I think is an improved microbenchmark for this: xor of hasheqs for a long, a double, a string, a class, a character and a PHM (single instance, so it'll be a hash lookup). The results are not very encouraging.

Single form including the {{require}} to make it convenient to run; also bundled is a {{j.u.HashMap}} (128 entries) hasheq benchmark:


(do
  (require '[criterium.core :as c])
  (let [l    41235125123
        d    123.456
        s    "asdf;lkjh"
        k    BigInteger
        c    \S
        phm  (apply hash-map (interleave (range 128) (range 128)))
        juhm (java.util.HashMap. phm)
        f    (fn f []
               (-> (clojure.lang.Util/hasheq l)
                   (bit-xor (clojure.lang.Util/hasheq d))
                   (bit-xor (clojure.lang.Util/hasheq s))
                   (bit-xor (clojure.lang.Util/hasheq k))
                   (bit-xor (clojure.lang.Util/hasheq c))
                   (bit-xor (clojure.lang.Util/hasheq phm))))]
    (c/bench (f))
    (c/bench (hash juhm))))


Mean execution time as reported by Criterium:

||version||xor (ns)||j.u.HM (µs)||
|unpatched 1.6.0|148.128748|1.701640|
|0005 patch|272.039667|21.201178|
|original patch|268.670316|21.169436|
|-alternative patch|271.747043|20.755397|

The substring patch is broken (see below), so I skipped it. The patch I'm describing as the "original" one is attached as 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch.

Decisions common to all the patches:

1. One extra {{if}} statement in {{hasheq}} just above the default return with a three-way {{instanceof}} check.

2. The types tested for are {{j.u.Iterable}}, {{j.u.Map.Entry}} and {{j.u.Map}}.

3. {{Murmur3.hashOrdered}} takes {{Iterable}}, so that's why it's on the list. {{Map}} does not extend {{Iterable}}, so it's listed separately. {{Map.Entry}} is on the list, because ultimately the way to hash maps is to iterate over and hash their entries.

4. The actual hashing of the "alien" / host types is done by a separate static method -- {{clojure.lang.Util.doalienhasheq}} -- on the theory that this will permit {{hasheq}} to be inlined more aggressively and limit the worst of the perf hit to alien collections.

5. {{doalienhasheq}} checks for {{Map}}, {{Map.Entry}}, {{Set}} and {{List}}; entries are converted to lists for hashing, maps are hashed through entry sets and lists and sets are passed directly to {{Murmur3}}.

6. There is also a default case for other {{Iterable}} types -- we must return {{hashCode}} or the result of composing some other function with {{hashCode}} for these, since we use {{equals}} to test them for equivalence.

The 0005 patch has {{hasheq}} call a separate private static method to perform the three-way type check, whereas the others put the check directly in the actual {{if}} test. The -alternative patch and the 0005 patch return {{hashCode}} in the default case, whereas the original patch composes {{Murmur3.hashInt}} with {{hashCode}}.

The substring patch only works for {{java.util.**}} classes and so doesn't solve the problem (it wouldn't correctly hash Guava collections, for example).

All of the patches change {{c.l.Util.hasheq}} and add one or two new static methods to {{clojure.lang.Util}} that act as helpers for {{hasheq}}. None of them changes anything else. Murmuring hashCode was a performance experiment that appeared to have a slight positive impact on some of the "fast cases" (in fact it's still the best performer among the current three patches in the microbenchmark presented above, although the margin of victory is of course extremely tiny). Thus I think all the current patches are in fact limited in scope to changes directly relevant to the ticket; the -alternative patch and the 0005 patch certainly are.
...