Alex Miller asked me to write this up here. I found a bug in clojure.core yesterday that involves not=
when used to compare NaNs. This was reported on Slack and the discussion is here:
https://clojurians.slack.com/archives/C03S1KBA2/p1733612992809069
Here is a more highly summarized version of the discussion from Slack (at least from my perspective). If we fire up the Clojure CLI, we can evaluate the following.
Clojure 1.12.0
user=> (= ##NaN ##NaN)
false
user=> (not= ##NaN ##NaN)
false
user=> (not (= ##NaN ##NaN))
true
The problem here is that =
, not
, and not=
have a relationship between them. Specifically, for any x
and y
, if (= x y)
returns a boolean, then (not (= x y))
should return the opposite boolean, and since not=
is defined as (not (= x y))
, it should return the same value as (not (= x y))
for all x
and y
. This doesn't happen if x
and y
are both ##NaN
.
Note that there were a lot of calories burned on Slack with suggestions that doubles should never be compared for equality, that NaNs are shifty, not-quite-value objects and should be avoided, that anybody who wants to test for the presence of a NaN should use NaN?
which is already in clojure.core, and that the documentation around equality should be updated to say some of those things. Many of those statements are true or good practice. But all of them miss the broader point.
This bug has nothing to do specifically to do with NaNs. It just seems that NaNs expose the bug. The real issue is with the contractual relationship between =
, not
, and not=
which appears to be violated in the presence of NaNs. Specifically, not=
is no longer referentially transparent with respect to (not (= ...))
.
On Slack, @potetm decompiled the code generated for these cases and found the following.
For (not= ##NaN ##NaN)
:
(clj-java-decompiler.core/decompile
(not= ##NaN ##NaN))
// Decompiling class: cjd__init
import clojure.lang.*;
public class cjd__init
{
public static final Var __not_EQ_;
public static final Object const__1;
public static void load() {
((IFn)cjd__init.__not_EQ_.getRawRoot()).invoke(cjd__init.const__1, cjd__init.const__1);
}
public static void __init0() {
__not_EQ_ = RT.var("clojure.core", "not=");
const__1 = Double.NaN;
}
static {
__init0();
Compiler.pushNSandLoader(RT.classForName("cjd__init").getClassLoader());
try {
load();
Var.popThreadBindings();
}
finally {
Var.popThreadBindings();
}
}
}
And then for (not (= ##NaN ##NaN))
:
(clj-java-decompiler.core/decompile
(not (= ##NaN ##NaN)))
// Decompiling class: cjd__init
import clojure.lang.*;
public class cjd__init
{
public static final Var __not;
public static void load() {
((IFn)cjd__init.__not.getRawRoot()).invoke(Util.equiv(Double.NaN, Double.NaN) ? Boolean.TRUE : Boolean.FALSE);
}
public static void __init0() {
__not = RT.var("clojure.core", "not");
}
static {
__init0();
Compiler.pushNSandLoader(RT.classForName("cjd__init").getClassLoader());
try {
load();
Var.popThreadBindings();
}
finally {
Var.popThreadBindings();
}
}
}
It appears that the compiler optimizes the call to =
and does not box the ##NaN
values when compiling (not (= ##NaN ##NaN))
, whereas the call to not=
receives the ##NaN
s boxed as Doubles. This then causes a subsequent call to clojure.lang.Util/equiv
(after following the call chain through not=
-> =
-> clojure.lang.Util/equiv
) to return true
improperly (since NaNs are never equal to anything, even themselves).
IMO, this is a bug, albeit a low priority one. Most programmers successfully use floating point math without ever having to deal with NaNs. While NaNs seem to trigger the bug, there may be other cases that also trigger it. I can't speak to that.