Share your thoughts in the 2025 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

+1 vote
ago in Syntax and reader by

For now characters are defined as \uXXXX so it is not possible to encode character beyond BMP. Surrogate pairs is a common way to encode such characters but LispReader explicitly throws when it sees it - https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java#L1217-L1218

1 Answer

0 votes
ago by

That line you linked is specifically for reading an individual character value (like \a) - as a single character it's not possible to represent both parts of a surrogate pair, so this seems correct (a Java char is an int).

However, you can use surrogate pairs in strings, which is how you would most commonly use them, and that works fine.

$  clj
Clojure 1.12.3
user=> "\ud83d\ude0d"
""

I believe surrogate pairs are not currently allowed in symbols or keywords, that's maybe something that could be extended in the future, but I haven't seen anyone looking for that yet.

ago by
The repl output should have a heart eyes emoji - worked when I pasted it here, but seems to have been lost on its trip through the comment code.
ago by
Using strings works well. But counting characters gives unexpected results in this case

(count "\ud83d\ude0d") ;; => 2
ago by
You can use the underlying code point support, for example:

(-> "\ud83d\ude0d" .codePoints .count);; => 1
ago by
All that is just the nature of the complicated situation of "char"s and Unicode and how they are exposed by the JVM / JDK.
...