Welcome! Please see the About page for a little more info on how this works.

+20 votes
in Clojure by

Rationale

Based on my own experience, and also by observing the beginners channel in Slack, an extremely common pitfall for newcomers is wondering how can you parse a string to an int.

This goes wrong in number of ways:

  • People discover the int function, whose docstring is Coerce to int -- that throws when passed a string.
  • People are suggested Integer/parseInt, that does the job, but then can't be used as (map Integer/parseInt coll-of-strings) -- this is surprising, because it's their first experience with Java interop.
  • People are suggested #(Integer/parseInt %), that does what's expected, but whose syntax can be a little bit too much for your first 10 minutes of Clojure.
  • Sometimes people are suggested #(Integer. %), which is deprecated in Java 11.
  • People go through the same procedure in ClojureScript, and have to select a different interop way to do this.

Proposal 1

Consider adding a parse-* family of functions in Clojure that are thin wrappers over the Java interop. See Appendix for a possible list.

Proposal 2

Consider exposing clojure.lang.LispReader.matchNumber as parse-number.

People can then use the various coercions functions to get back the precision that they need. This might fit better the rationale of this ticket, which is to make a very common "toy program" operation smoother for beginners, and matching the Reader's behaviour will be the least surprising thing.

People who are sensitive about performance should know more about the intricacies of boxed arithmetic on the JVM anyway. This is also pleasantly platform-agnostic, CLJS could expose match-number.

Questions/Alternatives

  • Should the functions return primitives or boxed values?
  • What should be the handling of strings like "0xff"? The parseFoo family of functions rejects those, but 0xff can be read by the Clojure reader.
  • OTOH, the decode family of functions handle some prefixes, but they return a boxed value. But they also accept numbers like #10 which is an invalid Clojure literal.

Appendix

A hopefully complete list of primitive-returning functions (as of JVM 8) is:

name 	args 	ret-value
parse-int 	s 	int
parse-int 	s, radix 	int
parse-uint 	s 	int
parse-uint 	s, radix 	int
parse-long 	s 	long
parse-long 	s, radix 	long
parse-ulong 	s 	long
parse-ulong 	s, radix 	long
parse-short 	s 	short
parse-short 	s, radix 	short
parse-byte 	s 	byte
parse-byte 	s, radix 	byte
parse-float 	s 	float
parse-double 	s 	double

The unsigned functionality was added in Java 8, so should be safe to use in newer Clojure versions. Newer JVM versions add support for parsing parts of CharSequences.

2 Answers

+1 vote
by

FYI, I think it's unlikely that we would add parser to match all the Java types. I do think it would be useful to have some subset of these to match the reader:

  • fixed precision integers - would always return a Long
  • fixed precision floating point (would always return a Double)
  • integers - match reader - return Long/BigInteger depending on need
  • floating point - match reader - return Double/BigDecimal depending on need
  • number - match reader in parsing all numeric formats

But needs more assessment.

by
as more rationale, beginners often people stumble upon `read-string` and use this. Certainly can get the job done but with lots of problems that aren't obvious to a beginner who just wants to turn "10" into 10.
by
yep, absolutely
by
this is by a wide margin, the most common Java interop people do and often the first people encounter while working advent of code or whatever exercises
by
The "parse-*" names probably appear reasonably often in programs. Overlap with clojure.core is not fatal, but it is a nuisance. Would these new functions go in the clojure.string namespace to avoid conflicts? A new namespace perhaps?
by
All names/locations TBD.
0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-2451 (reported by alex+import)
...