Welcome! Please see the About page for a little more info on how this works.

+1 vote
ago in tools.reader by
retagged ago by

Context
A user reported that rewrite-clj column number metadata can be incorrect for a token at the end of input.

Rewrite-clj uses clojure.tools/reader to read clojure source.
It uses clojure.tools.reader.reader-types/get-column-number to discover the current column offset.

Affected
clojure.tools/reader v1.3.4 through v1.5.0
Clojure only (not ClojureScript)

Env
On Linux using Clojure CLI 1.12.0.1517 with JDK 23.0.2 (temurin)

Symptom
When the element at end of input is a token, after it is read, the column number is one less than it should be.

Setup
Let's setup a playground in foo.clj

(ns foo
  (:require [clojure.pprint :refer [pprint]]
            [clojure.tools.reader.edn :as edn]
            [clojure.tools.reader.reader-types :as rt]))

(defn reader [s]
  (-> s
      rt/string-push-back-reader
      rt/indexing-push-back-reader))

(defn dump-with-cols [s]
  (println "input string:" s)
  (println "colnums:     " (apply str (take (inc (count s)) (cycle [1 2 3 4 5 6 7 8 9 0])))))

(defn read-by-obj [s]
  (with-open [r (reader s)]
    (loop [acc [{:at :bof :col-number (rt/get-column-number r)}]
           obj (edn/read r false ::eof {})]
      (if (not= ::eof obj)
        (recur (conj acc {:after obj :col-number (rt/get-column-number r)})
               (edn/read r false ::eof {}))
        (conj acc {:at :eof :col-number (rt/get-column-number r)})))))

(let [s (first *command-line-args*)]
  (dump-with-cols s)
  (println "parse result:")
  (pprint (read-by-obj s)))

Reproduction
Using clojure.tools/reader v1.5.0:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 9}
 {:at :eof, :col-number 9}]

Notice after :bip we see the correct :col-number 5, but after :bop we see the incorrect :col-number 9; it should be :col-number 10.

An incorrect column number occurs for any unwrapped last token, a number:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj 123
input string: 123
colnums:      1234
parse result:
[{:at :bof, :col-number 1}
 {:after 123, :col-number 3}
 {:at :eof, :col-number 3}]

a symbol:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj x
input string: x
colnums:      12
parse result:
[{:at :bof, :col-number 1}
 {:after x, :col-number 1}
 {:at :eof, :col-number 1}]

But we get a correct result for syntax wrapped elements:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj '(x)'
input string: (x)
colnums:      1234
parse result:
[{:at :bof, :col-number 1}
 {:after (x), :col-number 4}
 {:at :eof, :col-number 4}]

Diagnosis
If I dial clojure.tools/reader back to v1.3.3, I don't see the issue:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.3"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 10}
 {:at :eof, :col-number 10}]

And then bump one version to v1.3.4, I see the issue:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.4"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 9}
 {:at :eof, :col-number 9}]

There was only one significant change between v1.3.3 and v1.3.4.

Unreading the character at eof is causing the off-by-one issue.

If I change the existing:

  (loop [sb (StringBuilder.)
         ch initch]
    (if (or (whitespace? ch)
            (macro-terminating? ch)
            (nil? ch))
      (do (unread rdr ch)
          (str sb))
      (if (not-constituent? ch)
        (err/throw-bad-char rdr kind ch)
        (recur (doto sb (.append ch)) (read-char rdr))))))))

To:

  (loop [sb (StringBuilder.)
         ch initch]
    (cond
      (or (whitespace? ch)
          (macro-terminating? ch))
      (do (unread rdr ch)
          (str sb))

      (nil? ch)
      (str sb)

      (not-constituent? ch)
      (err/throw-bad-char rdr kind ch)

      :else
      (recur (doto sb (.append ch)) (read-char rdr)))))))

The problem is resolved for me.

Next Steps
Happy to contribute a patch if the above makes sense to you and you think it is worthwhile to fix.

1 Answer

+1 vote
ago by

Logged as https://clojure.atlassian.net/browse/TRDR-75, happy to see a patch.

...