Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Clojure by

We're seeing what looks like a violation of the keyword-interning invariant in production. The bug is deterministic (repeats thousands of times in the same order on the same Aleph/Netty thread) and is cleared by recompiling the affected namespaces via nREPL.

(when (and (map? result) (nil? (:payload result)))   ; this WHEN fires
  (let [payload-key (->> (keys result)
                         (filter #(.contains (pr-str %) "payload"))
                         first)
        payload-via-key (when payload-key (get result payload-key))]
    (log/warn {:result-type                   (str (type result))
               :result-keys                   (pr-str (keys result))
               :payload-key-equals-literal?   (= payload-key :payload)
               :payload-key-identical?        (identical? payload-key :payload)
               :payload-via-found-key-nil?    (nil? payload-via-key)}
              "diagnostic")))

Logged values when the bug fires:

{:result-type                   clojure.lang.PersistentArrayMap
 :result-keys                   (:payload :aws-xray)
 :payload-key-equals-literal?   true
 :payload-key-identical?        true
 :payload-via-found-key-nil?    false}

So:

  • The (:payload result) in the when returned nil.
  • A few lines later, the diagnostic body proves that the first key in (keys result) IS the body-site :payload literal by identity (and therefore by =, since Keyword inherits Object.equals).
  • (get result that-key) returns the actual non-nil payload value.

PersistentArrayMap.indexOf uses == for Keyword keys, so the only way (:payload result) returns nil while (keys result) yields a key that is identical? to :payload at a nearby site is if the :payload literal at the WHEN site and the :payload literal at the body site are two different Keyword instances, even though they're written identically in one source-level function.

We compareed the :payload literal with the payload-key and found that both have -

  • identical content hashcode (-383036092)
  • identical name bytes [0x70 0x61 0x79 0x6C 0x6F 0x61 0x64]
  • identical codepoints
  • same classloader

so it really looks like two distinct interned Keyword instances with the same name.

Environment

  • Clojure 1.12.4
  • Eclipse Temurin JDK 25, Shenandoah GC, virtual threads enabled
  • ARM64 (AWS Graviton, ECS Fargate)
  • Aleph + Netty, transit-clj/transit-java for decoding

Questions

  1. Could this be a bug in JVM / Clojure runtime?
  2. Has similar behavior been seen in other cases?
  3. What more evidence could we capture?
by
It cannot be two distinct instances of Keyword since `(identical? payload-key :payload)` returns `true`. You say that the bug is cleared when the namespace is "
recompiled" (I assume you mean reloaded) via nREPl. So could it be that there are multiple files with the same namespace on your classpath, and you load the properly working one via the REPL? Or maybe you have a cached `.class` file that's somehow newer than the corresponding `.clj` file but contains wrong bytecode.
by
Thanks for your response.

On stale / duplicate `.class` files:

Our setup is source-only. We build the uberjar with uberdeps (no AOT — it just packages source on classpath into a jar), and the container launches via `clojure.main -m ...`, so every namespace is read from .clj at JVM start. Our deps.edn has no AOT step, no gen-class, no :aot key, no compile-target on the classpath.

---

On the identical? returning true:

That check is at the body site only. It tells us `payload-key == :payload` at the body-site. The `when` guard's `(:payload result)` returning nil tells us `:payload != array[0]` at the when-site. Since `(keys result)` puts payload-key at array[0], and `payload-key == :payload`, we get `when-site :payload != body-site :payload` — two distinct Keyword instances, where both happen to be visible within one source-level fn.

If not two distinct instances of Keyword, what else could cause `(identical? payload-key :payload)` to return `true` but `(:payload result)` to return `nil` in the same function?
by
The only other idea that I have is that some `:payload` literals, specifically the one at the top-level `when`, could have invisible characters in them or characters that look identical to the characters in the ASCII range but have different Unicode code points.
by
Thanks for the quick reply.
Yes, I had also suspected that. But I logged content hashcode, name bytes and codepoints for the `:payload` literal and `payload-key` and everything is identitcal.

Hashcode was `-383036092`
name bytes were `name bytes [0x70 0x61 0x79 0x6C 0x6F 0x61 0x64]`

1 Answer

+2 votes
by

It might also be interesting to know if you are getting a null due to not-found key vs null value in the map. You could supply a not-found arg on (:payload result :NOT-FOUND) to distinguish.

Have you recently updated Clojure?

Is this problem reproducible such that you could try it again with different code, deps, etc?

Keyword invocation uses special call sites and the compiler code for that changed in Clojure 1.12.3, so I would be interested in whether the behavior changes if you use Clojure 1.12.2.

Or alternately, does the behavior change if you modify the code from (:payload result) to (get result :payload).

by
Thank you for the response, Alex — really appreciate it.

Clojure upgrade:
We jumped from Clojure 1.11.1 to 1.12.4 on 2025-12-22. Same commit also moved us from JDK 17 to JDK 25 with virtual threads. The bug investigation started around March 2026, so there's roughly a 2 month gap between the upgrade and our first reports though it's possible the bug existed earlier and went unnoticed.

Reproducibility:
Not manually. But we see this happening about 1 to 3 times a week. We have 10 instances running and handling roughly the same amount of workload, but the bug suddenly starts on one of the instances and then doesn't stop. We haven't been able to identify any patterns. It seems quite random.

Experiments we will try:
1. (:payload result :NOT-FOUND)
2. (get result :payload) vs (:payload result)

Since the bug isn't manually reproducible and nREPL recompile fixes it for the affected instance, each experiment needs a new release, and a probabilistic waiting window of ~1 week before we can claim a result with any confidence. We will do both the experiments above in a single release.

Depending on the results from the first two experiments, we will also consider downgrading Clojure to 1.12.2 and seeing if that fixes the issue.

Question:
Given that your hypothesis points at the keyword call-site compilation rather than at duplicate interned instances, would logging `System/identityHashCode` of the :payload literal and payload-key still be a useful diagnostic to add? Or does the call-site explanation make it largely irrelevant?

I'll report back as soon as we have signal — likely within a week. Thank you again.
ago by
Hi Alex,

The bug fired again. Here are the results -

- (:payload result :NOT-FOUND) returns the expected valid map.
- (get result :payload)  also returns the valid map.

We also updated the code to capture result in an atom. And through nrepl, we observed that:

- (:payload @*result-atom)
- (:payload @*result-atom :NOT-FOUND)
- (get @*result-atom :payload)

All three return the actual payload at the REPL

The atom is reset only inside `(when (and (map? result) (nil? (:payload result))) ...)`, so by definition the bytecode's 1-arg (:payload result) had to return nil for that result to be captured. Yet the same captured object, evaluated on the REPL, gives the actual payload.

We also verified that (map? @*result-atom) returned `true`.

We had printed the System/identityHashCode for the :payload literal and payload-key and they are the same. So the two-Keyword-instance theory is ruled out.

--

I'm wondering if this bug is at an even lower level than Clojure. Could it be a JVM bug?

I think the next step is to downgrade Clojure  to 1.12.2 and wait for a week or two to see if the bug does not fire again.

But if you want us to carry out any other experiments, please let us know.
...