Context
A user reported that rewrite-clj column number metadata can be incorrect for a token at the end of input.
Rewrite-clj uses clojure.tools/reader to read clojure source.
It uses clojure.tools.reader.reader-types/get-column-number
to discover the current column offset.
Affected
clojure.tools/reader v1.3.4 through v1.5.0
Clojure only (not ClojureScript)
Env
On Linux using Clojure CLI 1.12.0.1517 with JDK 23.0.2 (temurin)
Symptom
When the element at end of input is a token, after it is read, the column number is one less than it should be.
Setup
Let's setup a playground in foo.clj
(ns foo
(:require [clojure.pprint :refer [pprint]]
[clojure.tools.reader.edn :as edn]
[clojure.tools.reader.reader-types :as rt]))
(defn reader [s]
(-> s
rt/string-push-back-reader
rt/indexing-push-back-reader))
(defn dump-with-cols [s]
(println "input string:" s)
(println "colnums: " (apply str (take (inc (count s)) (cycle [1 2 3 4 5 6 7 8 9 0])))))
(defn read-by-obj [s]
(with-open [r (reader s)]
(loop [acc [{:at :bof :col-number (rt/get-column-number r)}]
obj (edn/read r false ::eof {})]
(if (not= ::eof obj)
(recur (conj acc {:after obj :col-number (rt/get-column-number r)})
(edn/read r false ::eof {}))
(conj acc {:at :eof :col-number (rt/get-column-number r)})))))
(let [s (first *command-line-args*)]
(dump-with-cols s)
(println "parse result:")
(pprint (read-by-obj s)))
Reproduction
Using clojure.tools/reader v1.5.0:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
-M foo.clj ':bip :bop'
input string: :bip :bop
colnums: 1234567890
parse result:
[{:at :bof, :col-number 1}
{:after :bip, :col-number 5}
{:after :bop, :col-number 9}
{:at :eof, :col-number 9}]
Notice after :bip
we see the correct :col-number 5
, but after :bop
we see the incorrect :col-number 9
; it should be :col-number 10
.
An incorrect column number occurs for any unwrapped last token, a number:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
-M foo.clj 123
input string: 123
colnums: 1234
parse result:
[{:at :bof, :col-number 1}
{:after 123, :col-number 3}
{:at :eof, :col-number 3}]
a symbol:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
-M foo.clj x
input string: x
colnums: 12
parse result:
[{:at :bof, :col-number 1}
{:after x, :col-number 1}
{:at :eof, :col-number 1}]
But we get a correct result for syntax wrapped elements:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
-M foo.clj '(x)'
input string: (x)
colnums: 1234
parse result:
[{:at :bof, :col-number 1}
{:after (x), :col-number 4}
{:at :eof, :col-number 4}]
Diagnosis
If I dial clojure.tools/reader back to v1.3.3, I don't see the issue:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.3"}}}' \
-M foo.clj ':bip :bop'
input string: :bip :bop
colnums: 1234567890
parse result:
[{:at :bof, :col-number 1}
{:after :bip, :col-number 5}
{:after :bop, :col-number 10}
{:at :eof, :col-number 10}]
And then bump one version to v1.3.4, I see the issue:
$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.4"}}}' \
-M foo.clj ':bip :bop'
input string: :bip :bop
colnums: 1234567890
parse result:
[{:at :bof, :col-number 1}
{:after :bip, :col-number 5}
{:after :bop, :col-number 9}
{:at :eof, :col-number 9}]
There was only one significant change between v1.3.3 and v1.3.4.
Unreading the character at eof is causing the off-by-one issue.
If I change the existing:
(loop [sb (StringBuilder.)
ch initch]
(if (or (whitespace? ch)
(macro-terminating? ch)
(nil? ch))
(do (unread rdr ch)
(str sb))
(if (not-constituent? ch)
(err/throw-bad-char rdr kind ch)
(recur (doto sb (.append ch)) (read-char rdr))))))))
To:
(loop [sb (StringBuilder.)
ch initch]
(cond
(or (whitespace? ch)
(macro-terminating? ch))
(do (unread rdr ch)
(str sb))
(nil? ch)
(str sb)
(not-constituent? ch)
(err/throw-bad-char rdr kind ch)
:else
(recur (doto sb (.append ch)) (read-char rdr)))))))
The problem is resolved for me.
Next Steps
Happy to contribute a patch if the above makes sense to you and you think it is worthwhile to fix.