Welcome! Please see the About page for a little more info on how this works.

+1 vote
in tools.reader by
closed by

Context
A user reported that rewrite-clj column number metadata can be incorrect for a token at the end of input.

Rewrite-clj uses clojure.tools/reader to read clojure source.
It uses clojure.tools.reader.reader-types/get-column-number to discover the current column offset.

Affected
clojure.tools/reader v1.3.4 through v1.5.0
Clojure only (not ClojureScript)

Env
On Linux using Clojure CLI 1.12.0.1517 with JDK 23.0.2 (temurin)

Symptom
When the element at end of input is a token, after it is read, the column number is one less than it should be.

Setup
Let's setup a playground in foo.clj

(ns foo
  (:require [clojure.pprint :refer [pprint]]
            [clojure.tools.reader.edn :as edn]
            [clojure.tools.reader.reader-types :as rt]))

(defn reader [s]
  (-> s
      rt/string-push-back-reader
      rt/indexing-push-back-reader))

(defn dump-with-cols [s]
  (println "input string:" s)
  (println "colnums:     " (apply str (take (inc (count s)) (cycle [1 2 3 4 5 6 7 8 9 0])))))

(defn read-by-obj [s]
  (with-open [r (reader s)]
    (loop [acc [{:at :bof :col-number (rt/get-column-number r)}]
           obj (edn/read r false ::eof {})]
      (if (not= ::eof obj)
        (recur (conj acc {:after obj :col-number (rt/get-column-number r)})
               (edn/read r false ::eof {}))
        (conj acc {:at :eof :col-number (rt/get-column-number r)})))))

(let [s (first *command-line-args*)]
  (dump-with-cols s)
  (println "parse result:")
  (pprint (read-by-obj s)))

Reproduction
Using clojure.tools/reader v1.5.0:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 9}
 {:at :eof, :col-number 9}]

Notice after :bip we see the correct :col-number 5, but after :bop we see the incorrect :col-number 9; it should be :col-number 10.

An incorrect column number occurs for any unwrapped last token, a number:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj 123
input string: 123
colnums:      1234
parse result:
[{:at :bof, :col-number 1}
 {:after 123, :col-number 3}
 {:at :eof, :col-number 3}]

a symbol:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj x
input string: x
colnums:      12
parse result:
[{:at :bof, :col-number 1}
 {:after x, :col-number 1}
 {:at :eof, :col-number 1}]

But we get a correct result for syntax wrapped elements:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.0"}}}' \
        -M foo.clj '(x)'
input string: (x)
colnums:      1234
parse result:
[{:at :bof, :col-number 1}
 {:after (x), :col-number 4}
 {:at :eof, :col-number 4}]

Diagnosis
If I dial clojure.tools/reader back to v1.3.3, I don't see the issue:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.3"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 10}
 {:at :eof, :col-number 10}]

And then bump one version to v1.3.4, I see the issue:

$ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.3.4"}}}' \
        -M foo.clj ':bip :bop'
input string: :bip :bop
colnums:      1234567890
parse result:
[{:at :bof, :col-number 1}
 {:after :bip, :col-number 5}
 {:after :bop, :col-number 9}
 {:at :eof, :col-number 9}]

There was only one significant change between v1.3.3 and v1.3.4.

Unreading the character at eof is causing the off-by-one issue.

If I change the existing:

  (loop [sb (StringBuilder.)
         ch initch]
    (if (or (whitespace? ch)
            (macro-terminating? ch)
            (nil? ch))
      (do (unread rdr ch)
          (str sb))
      (if (not-constituent? ch)
        (err/throw-bad-char rdr kind ch)
        (recur (doto sb (.append ch)) (read-char rdr))))))))

To:

  (loop [sb (StringBuilder.)
         ch initch]
    (cond
      (or (whitespace? ch)
          (macro-terminating? ch))
      (do (unread rdr ch)
          (str sb))

      (nil? ch)
      (str sb)

      (not-constituent? ch)
      (err/throw-bad-char rdr kind ch)

      :else
      (recur (doto sb (.append ch)) (read-char rdr)))))))

The problem is resolved for me.

Next Steps
Happy to contribute a patch if the above makes sense to you and you think it is worthwhile to fix.

closed with the note: fixed and released

1 Answer

+1 vote
by
selected by
 
Best answer

Logged as https://clojure.atlassian.net/browse/TRDR-75, happy to see a patch.

by
edited by
Thanks! Nicola beat me to it, applied a fix, and released v1.5.1, but I see the fix did not work for numbers, please read on.

I see correct results for keywords:

    $ clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.1"}}}' \
            -M foo.clj ':bip :bop'
    input string: :bip :bop
    colnums:      1234567890
    parse result:
    [{:at :bof, :col-number 1}
     {:after :bip, :col-number 5}
     {:after :bop, :col-number 10}
     {:at :eof, :col-number 10}]

And correct results for symbols:

    clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.1"}}}' \
            -M foo.clj x
    input string: x
    colnums:      12
    parse result:
    [{:at :bof, :col-number 1}
     {:after x, :col-number 2}
     {:at :eof, :col-number 2}]

But, oddly, a number still returns an incorrect :col-number (should be 4 instead of 3):

    clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.1"}}}' \
            -M foo.clj 123        
    input string: 123
    colnums:      1234
    parse result:
    [{:at :bof, :col-number 1}
     {:after 123, :col-number 3}
     {:at :eof, :col-number 3}]

(Apologies for unformatted code blocks, could not figure out how to get question2answer to format them monospace).
by
I just tried my proposed patch in the ask above on v1.5.0, and it also did not address fixing numbers.
by
Thanks to Nicola we have v1.5.2 and all is good!

clojure -Sdeps '{:deps {org.clojure/tools.reader {:mvn/version "1.5.2"}}}' -M foo.clj 123
Downloading: org/clojure/tools.reader/1.5.2/tools.reader-1.5.2.pom from central
Downloading: org/clojure/tools.reader/1.5.2/tools.reader-1.5.2.jar from central
input string: 123
colnums:      1234
parse result:
[{:at :bof, :col-number 1}
 {:after 123, :col-number 4}
 {:at :eof, :col-number 4}]
...