Welcome! Please see the About page for a little more info on how this works.

0 votes
in tools.reader by

Question
I am a maintainer on rewrite-clj.
Rewrite-clj gratefully uses clojure tools.reader to parse Clojure source.
A rewrite-clj user working on Windows recently raised an issue wondering why his Windows \r\n newlines were being converted to \n.

I basically shrugged and effectively answered, "because rewrite-clj is using tools.reader". I felt a bit guilty for being so curt, hence this question here on Ask Clojure.

On behalf of that rewrite-clj user, I ask:

  1. Is it the intent of tools.reader to normalize all newlines to \n? (I think it is)
  2. Should it be?

The big advantage is that newlines are normalized but it also does mean a loss of information.

Related
A while back I raised an issue for Clojure tools reader on how it normalizes Windows newlines.

1 Answer

0 votes
by

What does it mean to 'normalize all newlines"? I don't understand what this is asking.

by
Thanks for the reply! I was using the language I saw in the Clojure tools reader source code.

My understanding is that is uses "normalize newline" to describe:

- `\r\n` is converted it to `\n`.
- (Also, maybe a bit oddly, it converts `\r\f` to `\n`).

This means that the original OS specific newline is not available from the tools reader parsed result.
by
I guess what I'm asking is - tools.reader (and the Clojure LispReader) is fundamentally about reading forms from a text stream. The whitespace is mostly not in the forms that are the results (as whitespace is discarded), other than perhaps string literals.
by
Oh right, yes, good point! Thanks for your patience on this.

Rewrite-clj digs deeper into tools.reader than most users might.

It makes use of clojure.tools.reader.reader-types namespace using fns like read-char, unread and peekchar.  It takes this lower level approach to return things the higher level reader fns do not return, like whitespace, comments and #_ :skipped-stuff.

So when rewrite-clj gets that whitespace, the newlines are always \n (even if they were originally \r\n).
...