Welcome! Please see the About page for a little more info on how this works.

+1 vote
in data.csv by

When whitespace is present after the closing \" the clojure reader crashes with a weird error.
It took me some time to notice it was a white-space issue as whitespace is .... not visible.

See an example of the error below.

=> (read-csv (java.io.StringReader. "\"a\" " ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)
=> (read-csv (java.io.StringReader. "\"a\"" ))
((link: "a"))

3 Answers

+1 vote
by

Comment made by: pschulz01

This is related to DSCV-8

A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.

0 votes
by

Comment made by: cvkemenade

To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see:
=> (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" ))
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)

This raises the question what happens if you put a space between the separator and the opening quote (first the default case):
=> (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" ))
((link: "a" " 5") (link: "b" "6"))

Now adding one additional space:
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" ))
((link: "a" " 5") (link: " \"b\"" "6"))

Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read.
The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it.
=> (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" ))
((link: "a" " 5") (link: " \"b" "b\"" "6"))

Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values.

When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly.
Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").

0 votes
by
Reference: https://clojure.atlassian.net/browse/DCSV-6 (reported by cvkemenade)
...