Welcome! Please see the About page for a little more info on how this works.

0 votes
ago in Clojure by

There's a semi-frequent back and forth happens with new users to Clojure: "I used keyword on the string "hello world" and it gave me back ":hello world" which doesn't roundtrip. Is it broken?" "The function keyword doesn't validate its inputs." "Why not?" "Performance and ease of use."

In addition to maybe clarifying their docstrings to explicitly say they don't validate input, I think it would be helpful to have "safe" versions of keyword and symbol (and gensym?) that throw errors on incorrect inputs:

(defn safe-keyword
  ([name] (cond (keyword? name) name
                (symbol? name) (clojure.lang.Keyword/intern ^clojure.lang.Symbol name)
                (string? name)
                (if (re-matches #"cool-keyword-regex" name)
                  (clojure.lang.Keyword/intern ^String name)
                  (throw (IllegalArgumentException. (str "Given bad string: " name))))))
  ([ns name]
   (if (re-matches #"cool-keyword-regex" name)
     (clojure.lang.Keyword/intern ns name)
     (throw (IllegalArgumentException. (str "Given bad string: " name))))))

Then keyword's docstring can just say "Does not validate input. See safe-keyword for a version that does." or similar.

1 Answer

+2 votes
ago by
selected ago by
 
Best answer

For reference of those arriving here, there is a faq entry about this issue at https://clojure.org/guides/faq#unreadable_keywords.

Of note in that faq, there are valid (and safe!) reasons to programmatically create and use keywords and symbols that cannot be roundtripped through read. So the issue is not one of "safety" per se and I don't expect that we will add variants of keyword or symbol like this.

This question is not really identifying a problem or a use case that needs to be addressed. Working backwards - in what use cases would you want this feedback? Only in cases where you don't know what the string is (ie, user input). From a developer point of view, you should be validating that input before invoking these functions - where exactly that happens will depend on the application. If anything, creating a predicate that validates whether a string can be roundtripped as a keyword or symbol might be helpful (these regexes exist in the reader so read-string is 90% of the way there).

One problem area that we do intend to work on is the issue of printing edn data that you want to later read, which is where things like this tend to be an issue. There are many flavors of this - keywords (or less frequently symbols) that can't be read back, regex values (not supported in edn), ordered map or set types, etc. These are all issues around print control, and how you handle them is likely to depend a lot on the use case. A real problem in this area is that print control is global and installing validation or transformation logic affects everyone in the runtime. We would like to have more localized print control and possibly standard print support for "edn-safe" printing.

...