Welcome! Please see the About page for a little more info on how this works.

+2 votes
in Collections by
edited by

Just ran into some interesting behavior with sorted-set. Namely, the contract to the user for a normal set is nil when the element doesn't exist. However, with sorted-set, the call will throw if the comparator doesn't gracefully handle the type. The reason makes sense (the comparator requires the ability to sanely compare elements), but the interface is somewhat confusing in the "no membership" case since we know that the element doesn't exist in the set if its type can't be compared by default.

In other words, I guess I'm proposing that a sane default behavior for a sorted-set using the default comparator should be nil rather than throw when the type cannot be compared, in order to allow nil-punning.

As an example, this case was encountered when the form was passed to a keyword function (:foobar form), which allows nil-punning on a lot of other collections. Here's a small snippet:

user> (def normal #{"0000"})

;; => #'user/normal

user> (def sorted-version (sorted-set "0000"))

;; => #'user/sorted-version

user> normal

;; => #{"0000"}

user> sorted-version

;; => #{"0000"}

user> (:foobar normal)

;; => nil

user> (:foobar sorted-version)

Execution error (ClassCastException) at user/eval246355 (form-init59552761865518221.clj:8123).
class java.lang.String cannot be cast to class clojure.lang.Keyword (java.lang.String is in module java.base of loader 'bootstrap'; clojure.lang.Keyword is in unnamed module of loader 'app')

I think my options for handling this are either try/catch or write a custom comparator, and both seem a little wonky :)

I'm probably missing something, but thought it was interesting anyway :)

1 Answer

+2 votes
selected by
Best answer

If you have a sorted set of strings, asking whether it contains a keyword is an exceptional case. What should that even mean?

If you want tolerance for this, you will need to provide a comparator that provides a sort order over a broader set of things you might pass it.

Thanks for the response! A couple follow-ups:

  - Why is checking whether a keyword is contained in a _sorted_ set of strings any more exceptional that checking whether it's in a _normal_ set of strings? (Regardless of the fact that the implementation of a sort requires comparison, mostly curious from an API perspective why it's deemed exceptional to check membership in the _sorted_ case but not the normal case).

  - For that matter, why is `(:foo "bar")` not exceptional? What is that call supposed to mean?

Essentially, it seems to me that lots of things gracefully handle `(:foo coll)`, despite them not really making sense, but they don't throw exceptions and happily return `nil`. :)
Because being sorted means you have a comparator with implicit expectations about what you are able to sort.

`(:foo "bar")` is exceptional and it's behavior should be considered undefined
Fair enough :) I guess part of the confusion is that:

`(:foo "bar")` is exceptional, but does not throw an exception
`(:foo sorted-set-coll)` is exceptional, but _does_ throw an exception

The original problem was found from some code that was running postwalk and calling a keyword-fn on each form, which works fine for most datastructures (like strings).
perhaps a better word for these cases is "undefined"
I've just had this issue come to bite me in our codebase at work. I built a sorted-map containing only numbers as keys; it works fine during normal testing but crashes on type checks such as `(protocol? my-map)` and sometimes during spec validations. Why? Because those things check for the existence of certain keywords in the map, and doing so throws an exception because it tries to compare a keyword to a number.

To me this fundamentally breaks the contract of what it means to be a map. One expects (and in fact clojure.core and clojure.spec.alpha expect), that if a value satisfies `map?` then one can call a keyword on it and either get a value or a nil.

I can't see any justification for the claim that asking if a sorted-set of strings contains a keyword is an exceptional (or even undefined) case. The answer to "what does that even mean?" is simple - it means, "is that key in this set?", the answer to which will always be no. In mathematics, I can ask if an imaginary number is in the ordered set of integers, and despite those not being comparable the answer is not undefined, it is just no.

It makes sense that trying to *insert* a keyword into a sorted-set of numbers should blow up. But it makes no sense (from an API perspective) that querying for a non-existent member should ever throw an exception.

IIRC Rich Hickey pointed out that strong typing doesn't actually catch any serious bugs, just ones you would find in very basic testing. However this unexpected behaviour of a sorted-map has caused a serious bug that was not caught during normal testing, and would not have been possible in a strongly typed language. To me that means either Rich is wrong about type systems or the implementation of sorted-map/sorted-set is faulty, and I tend towards the latter.
Note that (strongly typed) Java has the same behavior, which is to some degree what we cued on when looking at this.

user=> (def m (java.util.TreeMap.))
user=> (.put m "a" "b")
user=> (.containsKey m 100)
Execution error (ClassCastException) at java.lang.Long/compareTo (Long.java:59).
class java.lang.String cannot be cast to class java.lang.Long (java.lang.String and java.lang.Long are in module java.base of loader 'bootstrap')
If you try to do that in *actual* Java, the compiler will stop you.

It lets you run that code via Clojure's Java interop only because Clojure circumvents Java's strong typing using type casting, which is why you get a ClassCastException at runtime.