Welcome! Please see the About page for a little more info on how this works.

+2 votes
in Syntax and reader by

Known areas of under-specificity (http://clojure.org/reader#The Reader--Reader forms):

  • symbols (and keywords) description do not mention constituent characters that are currently in use by Clojure functions such as <, >, =, $ (for Java inner classes), & (&form and &env in macros), % (stated to be valid in edn spec)
  • keywords currently accept leading numeric characters which is at odds with the spec - see CLJ-1286

References:

by
With a cursory look at the current pages (the guide: https://clojure.org/guides/weird_characters and the reference: https://clojure.org/reference/reader), seems like some of this has been addressed but not all.

I wonder how out of date this Ask and the associated Jira ticket are, if they should be updated/closed.

15 Answers

0 votes
by

Comment made by: jafingerhut

The Clojure reader documentation also does not mention the following symbols as valid constituent characters. They are all mentioned as valid symbol constituent characters in the EDN readme here: https://github.com/edn-format/edn#symbols

dollar sign - used in Clojure/JVM to separate Java subclass names from class names, e.g. java.util.Map$Entry
percent sign - not sure why this is part of edn spec. In Clojure it seems only to be used inside #() for args like % %1 %&
ampersand - like in &form and &env in macro definitions
equals - clojure.core/= and many others
less-than - clojure.core/< clojure.core/<=
greater-than - clojure.core/> clojure.core/>=

I don't know whether Clojure and edn specs should be the same in this regard, but it seemed worth mentioning for this ticket.

0 votes
by

Comment made by: jafingerhut

Alex, Rich made this comment on CLJ-17 in 2011: "Runtime validation off the table for perf reasons. cemerick's suggestion that arbitrary symbol support will render them valid is sound, but arbitrary symbol support is a different ticket/idea." I am not aware of any tickets that propose the enhancement of allowing arbitrary symbols to be supported by Clojure, e.g. via a syntax like

`

|white space and arbitrary #$@)$~))@ chars here|

`

Do you think it is reasonable to create an enhancement ticket for supporting arbitrary characters in symbols and keywords?

0 votes
by

Comment made by: alexmiller

Sure. I looked into this a bit as a digression off of feature expressions and #| has been reserved for this potential use. However, there are many tricky issues with it and I do not expect this to happen soon - more likely to be something we're pushed to do when necessary for some other reason.

0 votes
by

Comment made by: bendlas

Wrong ticket, but to anybody thinking about #|arbitrary symbols (or strings)|, please do consider making the delimiters configurable, as in mime multipart.

0 votes
by

Comment made by: jafingerhut

I've created a design page for now. I'm sure it does not list many of the tricky issues you have found. I'd be happy to take a shot at documenting them if you have any notes you are willing to share.

http://dev.clojure.org/pages/viewpage.action?pageId=11862058

0 votes
by

Comment made by: jafingerhut

Herwig, can you edit the design page linked in my previous comment, to add a reference or example to precisely how mime multipart allows delimiters to be configurable, and why you believe fixed delimeters would be a bad idea?

0 votes
by

Comment made by: bendlas

I've commented on the design page.

0 votes
by

Comment made by: alexmiller

Removed a couple of issues that have been clarified on the reader page and are no longer issues.

0 votes
by

Comment made by: bronsa

Related to CLJ-1530

0 votes
by

Comment made by: adamfrey

Related to this: The Clojure reader will not accept symbols and keywords that contain consecutive colons (See (link: https://github.com/clojure/clojure/commit/005ea1b5f96c5bb762e155032a865e29ad71bcf3#diff-3a5dca122734225f3f60263876401aebR275 text: LispReader.java)), although that is permitted by the current EDN spec. Here is a (link: https://github.com/edn-format/edn/issues/68 text: GitHub issue) regarding consecutive colons. I would like to qualify why consecutive colons are disallowed, and sync up the Clojure Reader and the EDN spec on this.

0 votes
by

Comment made by: bendlas

The updated reader spec says that a symbol can contain a single / to separate the namespace. It also mentions a bare / to be the division function.
So what about clojure.core//? That still got to be a readable symbol right? So is that an exception to the 'single /' rule?
Will foo.bar// also be readable? What about foo//bar?

0 votes
by

Comment made by: favila

Another source of ambiguity I see is that it's unclear whether the first colon of a keyword is the first character of the keyword (and therefore of the symbol) or whether it is something special and the spec really describes what happens from the second character onward. This matters because the specification for a keyword is (in both edn and reader specs) given in terms of differences from symbols. I think many of the strange keyword edge cases (including legality of :1 vs :a/1) stem from this ambiguity, and different tickets/patches seem to choose one or the other underlying assumption. See (link: http://dev.clojure.org/jira/browse/CLJS-677?focusedCommentId=35025&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35025 text: this comment) for more examples.

Possibly we can use tagged literals for keywords and symbols to create or print these forms when they are not readable and simplify the reader spec for their literal forms. E.g. instead of producing complicated parse rules to ensure clojure.core// or :1 are legal, just make the literal form simple and have users write something like #sym(link: "clojure.core" "/") or #kyw "1" (and have the printer print these) when they hit these edge cases.

0 votes
by

Comment made by: alexmiller

I would say : (and ::) are syntactic markers and the spec describes the characters following it. But I agree it would be nice for this to be more explicit. The (incorrect) regex in LispReader does not help either.

The tagged literal idea is an interesting alternative to the | | syntax that has been reserved for possible future support for invalid characters in keywords and symbols. But I think the idea is out of scope for this ticket, which is really about clarifying the spec.

0 votes
by

Comment made by: kunstmusik

Coming to this late, I had mentioned on the user mailing list in:

https://groups.google.com/forum/#!topic/clojure/CwZHu1Eszbk

that 1. is currently allowed as part of a symbol name, such that:

(let (link: a# 4 b#a 3) (println a1. b#a))

will print "4 3".

  1. is also employed in auto-gensyms and discussed in http://clojure.org/reference/reader#syntax-quote as part of a symbol's name. From the mailing list thread, 1. was noted as "may be allowed now, but could be changed later". I would appreciate if it is more clearly described as a special case/reserved, and would ask that its use be restricted in the reader to prevent users from using it now and potentially have code break later.
0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1527 (reported by bendlas)
...