Bug in Clojurescript string/split with limit?

Question

Bug in Clojurescript string/split with limit?

asked Apr 23 in ClojureScript by brjann
retagged Jul 18 by alexmiller

This seems like a bug to me? Using limit in clojure.string/split (in Clojurescript) gives the wrong result on a simple split.

;; NB Clojurescript
(clojure.string/split "a|b" #"|")
;=> ["a" "|" "b"]
(clojure.string/split "a|b" #"|" 3)
;=> ["" "" "a|b"]

Results should be the same?

Corresponding code in Javascript

"a|b".split(/|/)
;=> ['a', '|', 'b']
"a|b".split(/|/, 3)
;=> ['a', '|', 'b']

commented Apr 23 by Eugene Pakhomov

commented Apr 23 by brjann

2 Answers

Eugene Pakhomov · Answer 1 · 2025-04-23T12:25:01+0000

Just to add to the report - the problem also arises when #"" is used.

In Clojure:

user=> (clojure.string/split "abc" #"")
["a" "b" "c"]
user=> (clojure.string/split "abc" #"" 3)
["a" "b" "c"]

In ClojureScript:

cljs.user=> (clojure.string/split "abc" #"")
["" "a" "b" "c"]
cljs.user=> (clojure.string/split "abc" #"" 3)
["" "a" "bc"]

Another case - lookahead, e.g. (clojure.string/split "a-b" #"(?=-)" 2).

brjann · Answer 2 · 2025-04-23T13:27:37+0000

I have done some more digging and found other problems. The regex #"&([^;\s<&]+);?" should (a) work and (b) include capturing groups in the result.

As far as I understand, this regex does not include lookaheads or -behinds.

(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?")
;=> ["a" "amp" "b" "amp" "c"]
(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?" 5)
;=> ["" "" "" "" "p;b&amp;c"]

The https://clojure.atlassian.net/browse/CLJS-2528 patch no 5 does not resolve this problem, as far as I've been able to implement it.

(let [limit 100
    re #"&([^;\s<&]+);?"
    s "a&amp;b&amp;c"]
(let [limit (dec limit)
      re (js/RegExp. (.-source re)
                     (cond-> "g"
                             (.-ignoreCase re) (str "i")
                             (.-multiline re) (str "m")
                             (.-unicode re) (str "u")))]
  (loop [s s, parts []]
    (if (or (<= limit (count parts))
            (string/blank? s))
      (conj parts s)
      (let [m (.exec re s)]
        (if (nil? m)
          (conj parts s)
          (let [_ (println m)
                index (.-index m)
                matched-str (aget m 0)
                matched-str-len (count matched-str)
                next-parts (cond-> parts
                                   (and (or (< 0 index)
                                            (< 0 matched-str-len))
                                        (not= s matched-str))
                                   (conj (.substring s 0 index)))]
            (if (<= (count s) (+ index (count matched-str)))
              next-parts
              (do
                (set! (.-lastIndex re)
                      (if (and (== 0 index)
                               (== 0 matched-str-len))
                        1 0))
                (recur (.substring s (+ index (count matched-str)))
                                    next-parts))))))))))
;=>  ["a" "b" "c"]

The string is correctly split but does not include the capturing groups.

Bug in Clojurescript string/split with limit?

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Bug in Clojurescript string/split with limit?

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories