Welcome! Please see the About page for a little more info on how this works.

0 votes
in Docs by
clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:


(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]


In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings. A workaround for split is to pass a limit of -1.

*Proposed:* As current users may be relying on the current behavior, the attached merely updates the docstring to warn of this behavior and suggest use of -1 as a limit to workaround.

*Patch:* clj-1360-2.patch

5 Answers

0 votes
by

Comment made by: jafingerhut

Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc.

I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split

0 votes
by

Comment made by: retrogradeorbit

This bug just bit me. +1 to be fixed. If we just document and leave the behavior as is, then we have a surprising and inconsistent behaving split (why are inner empty values kept, but outer ones stripped?) that is different to every other split you've ever used. The optional -1 limit argument looks hacky but a fix could keep this -1 argument working.

EDIT: this looks to be java's string class behavior: http://stackoverflow.com/questions/2170557/split-method-of-string-class-does-not-include-trailing-empty-strings
Would be nice if limit defaulted to -1 on that type of clojure.string/split call.

0 votes
by

Comment made by: stu

This is really gross, and the original developer has been punched in the neck. (Ow.)

I hate the Java leakage, but given that this is already out there, and that people are likely already relying on both the default and the negative-arg behavior, I think the least bad bet is to document precisely the semantics we have.

0 votes
by

Comment made by: alexmiller

Incorporate changes from CLJ-1857 in clj-1360-2.patch.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1360 (reported by timmc)
...