We're seeing what looks like a violation of the keyword-interning invariant in production. The bug is deterministic (repeats thousands of times in the same order on the same Aleph/Netty thread) and is cleared by recompiling the affected namespaces via nREPL.
(when (and (map? result) (nil? (:payload result))) ; this WHEN fires
(let [payload-key (->> (keys result)
(filter #(.contains (pr-str %) "payload"))
first)
payload-via-key (when payload-key (get result payload-key))]
(log/warn {:result-type (str (type result))
:result-keys (pr-str (keys result))
:payload-key-equals-literal? (= payload-key :payload)
:payload-key-identical? (identical? payload-key :payload)
:payload-via-found-key-nil? (nil? payload-via-key)}
"diagnostic")))
Logged values when the bug fires:
{:result-type clojure.lang.PersistentArrayMap
:result-keys (:payload :aws-xray)
:payload-key-equals-literal? true
:payload-key-identical? true
:payload-via-found-key-nil? false}
So:
- The
(:payload result) in the when returned nil.
- A few lines later, the diagnostic body proves that the first key in
(keys result) IS the body-site :payload literal by identity (and therefore by =, since Keyword inherits Object.equals).
(get result that-key) returns the actual non-nil payload value.
PersistentArrayMap.indexOf uses == for Keyword keys, so the only way (:payload result) returns nil while (keys result) yields a key that is identical? to :payload at a nearby site is if the :payload literal at the WHEN site and the :payload literal at the body site are two different Keyword instances, even though they're written identically in one source-level function.
We compareed the :payload literal with the payload-key and found that both have -
- identical content hashcode (
-383036092)
- identical name bytes
[0x70 0x61 0x79 0x6C 0x6F 0x61 0x64]
- identical codepoints
- same classloader
so it really looks like two distinct interned Keyword instances with the same name.
Environment
- Clojure
1.12.4
- Eclipse Temurin JDK 25, Shenandoah GC, virtual threads enabled
- ARM64 (AWS Graviton, ECS Fargate)
- Aleph + Netty, transit-clj/transit-java for decoding
Questions
- Could this be a bug in JVM / Clojure runtime?
- Has similar behavior been seen in other cases?
- What more evidence could we capture?