Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in ClojureScript by

ClojureScript:foo> (r/read-string ":0") "Error evaluating:" (r/read-string ":0") :as "cljs.reader.read_string.call(null,\":0\")" org.mozilla.javascript.EcmaError: TypeError: Cannot read property "0.0" from null (file:/home/chas/dev/clojure/cljs/.repl/cljs/reader.js#451)

The topic of leading digits in keywords came up separately, as they've been supported in Clojure for some time, but can now be considered part of the spec, as it were. See CLJ-1286.

BTW, this is another (link: https://github.com/reiddraper/simple-check/ text: simple-check) win... :-)

11 Answers

0 votes
by
_Comment made by: cemerick_

This is not a simple regex change, as I had hoped given the recent flurry in Clojure.  The symbol pattern in {{cljs.reader}} is faithful to Clojure HEAD, but the processing of matches isn't.  I think it may be a wash as to whether it'd be easier to fix what's there vs. porting {{clojure.tools.reader.impl.commons/parse-symbol}} (which incidentally doesn't use a regex)…either way, leaving it for another day (or someone else, if they're up for it).
0 votes
by

Comment made by: favila

I think I fixed the match processing issue you're talking about (CLJS-775 CLJS-776)? However I'm still confused by this and CLJ-1286. The clojure reader docs and edn spec still say they should reject :0, but 1.6.0 doesn't. What's the expected behavior? Is the spec going to be fixed, or clojure reader fixed once downstream packages are fixed?

0 votes
by

Comment made by: wagjo

AFAIK EDN specs do not reject :0 (no rule that the second character cannot be a digit). See https://github.com/wagjo/serialization-formats for my interpretation of existing specs.

0 votes
by

Comment made by: favila

Ah, I think I see the source of the confusion. Both EDN and the clojure reader spec both say something like "keywords are like symbols, except beginning with a colon." The confusion lies in whether we interpret that as meaning

  1. First character is a colon, then the second character and after are matched against the symbol definition.
  2. The first character is a colon, and the whole form is matched against the symbol definition.

CLJ-1003 CLJ-1252 and CLJ-1286 and myself all seem to understand the first meaning. This might be because when we say "the first character of a keyword" we typically mean the first character after the colon, as if the colon is "special" and not part of the keyword (e.g. like a reader macro character).

However clojure 1.6 seems to be following the second meaning (and explains why :0/a is ok but not :0/0), and I'm not sure from the cited tickets and google group discussions whether this is because of downstream breakage or if this is the intended interpretation and the patch from CLJ-1252 was accepted by Alex Miller erroneously.

Note if we accept the second interpretation, then the restriction "A symbol can contain one or more non-repeating ':'s." from the clojure reader docs is incorrect for keywords. (EDN doesn't allow namespace-expanded keywords, it seems, so it's not an issue there.)

Also EDN allows contiguous colons in symbols, whereas clojure 1.6 and the reader spec do not.

0 votes
by

Comment made by: favila

Also clojure 1.6 allows {{a/:a}} and {{:a/:a}} (where name part violates first-character rule for symbols), even though the specs do not. (This is something your table doesn't mention. Very thorough work BTW! I wish the reader spec was more formalized and unambiguous...)

0 votes
by

Comment made by: favila

I think this pattern follows the specs:

`

"(?x)

(?!///) # edge case: / only allowed in name part.

name or namespace part of symbol or keyword

(?:
#division symbol
(/
# normal symbol
|[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]
# symbol starting with [-+.]
|[-+.](?:[a-zA-Z
!?$%&=<>#:+.-][0-9a-zA-Z*!?$%&=<>#:+.-])?)
# keyword
|(::?)([0-9a-zA-Z
!_?$%&=<>#:+.-]+))

name part when namespace is present

(?:/(/ # division symbol

|[a-zA-Z*!_?$%&=<>][0-9a-zA-Z*!_?$%&=<>\#:+.-]*
|[-+.](?:[a-zA-Z*!_?$%&=<>\#:+.-][0-9a-zA-Z*!_?$%&=<>\#:+.-]*)?))?

groups:

1: symbol name or namespace 2: keyword colon(s) 3: keyword name or namespace

4: keyword or symbol name (and groups 1 and 3 are namespaces)"

`

Problems:

  1. Does not enforce no-repeating-colon rule (but it is easy to validate after matching).
  2. Rejects violations of first-character-rule in symbols which clojure accepts.
  3. Accepts a trailing colon on namespace (unlike clojure).
  4. Accepts {{foo//}} or {{:foo//}}, which are not clearly addressed by the specs. (Jozef's table has more background). These are both allowed in Clojure 1.6, but not 1.5 or (arguably) edn.
0 votes
by

Comment made by: favila

Another problem: Accepts {{:::a/b}}, which I think is ok per the specs but is not read by 1.6. Crazy example:

`
user=> (require ['clojure.core :as (symbol ":a")])
nil
user=> :::a/map

RuntimeException Invalid token: :::a/map clojure.lang.Util.runtimeException (Util.java:221)
user=> (resolve (symbol ":a" "map"))

'clojure.core/map

`

Theoretically I might expect {{:::a/map}} to be read as {{:clojure.core/map}}?

0 votes
by

Comment made by: alex+import

Bumping this up, as I just scratched my head for an hour to find out this was the culprit

0 votes
by
_Comment made by: dnolen_

Nicolás, the premise of the ticket is that this should be supported when clearly the Clojure documentation about valid keywords states that it isn't. The Clojure implementation just happens to allow it. In anycase, this needs to be sorted out in Clojure first.
0 votes
by

Comment made by: favila

I think CLJ-1527 is currently the ticket where this problem is pursued.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJS-677 (reported by cemerick)
...