Welcome! Please see the About page for a little more info on how this works.

0 votes
in ClojureScript by
edited by

This seems like a bug to me? Using limit in clojure.string/split (in Clojurescript) gives the wrong result on a simple split.

;; NB Clojurescript
(clojure.string/split "a|b" #"|")
;=> ["a" "|" "b"]
(clojure.string/split "a|b" #"|" 3)
;=> ["" "" "a|b"]

Results should be the same?

Corresponding code in Javascript

"a|b".split(/|/)
;=> ['a', '|', 'b']
"a|b".split(/|/, 3)
;=> ['a', '|', 'b']
by
There's a similar https://clojure.atlassian.net/browse/CLJS-2528 but it's not the same as this reproduction case doesn't use lookaheads or -behinds.
by
Reported to #cljs-dev.
by
Just to clarify the bug report, I was trying to split on `|`. This does however require escaping, `#"\|"`. which I wasn't aware of.  So the observed difference in behavior when using `#"|"` may not be meaningful - but the additional example by Eugene does show a difference in behavior compared to Clojure.

2 Answers

0 votes
by

Just to add to the report - the problem also arises when #"" is used.

In Clojure:

user=> (clojure.string/split "abc" #"")
["a" "b" "c"]
user=> (clojure.string/split "abc" #"" 3)
["a" "b" "c"]

In ClojureScript:

cljs.user=> (clojure.string/split "abc" #"")
["" "a" "b" "c"]
cljs.user=> (clojure.string/split "abc" #"" 3)
["" "a" "bc"]
0 votes
by
edited by

I have done some more digging and found other problems. The regex #"&([^;\s<&]+);?" should (a) work and (b) include capturing groups in the result.

As far as I understand, this regex does not include lookaheads or -behinds.

(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?")
;=> ["a" "amp" "b" "amp" "c"]
(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?" 5)
;=> ["" "" "" "" "p;b&amp;c"]

The https://clojure.atlassian.net/browse/CLJS-2528 patch no 5 does not resolve this problem, as far as I've been able to implement it.

(let [limit 100
    re #"&([^;\s<&]+);?"
    s "a&amp;b&amp;c"]
(let [limit (dec limit)
      re (js/RegExp. (.-source re)
                     (cond-> "g"
                             (.-ignoreCase re) (str "i")
                             (.-multiline re) (str "m")
                             (.-unicode re) (str "u")))]
  (loop [s s, parts []]
    (if (or (<= limit (count parts))
            (string/blank? s))
      (conj parts s)
      (let [m (.exec re s)]
        (if (nil? m)
          (conj parts s)
          (let [_ (println m)
                index (.-index m)
                matched-str (aget m 0)
                matched-str-len (count matched-str)
                next-parts (cond-> parts
                                   (and (or (< 0 index)
                                            (< 0 matched-str-len))
                                        (not= s matched-str))
                                   (conj (.substring s 0 index)))]
            (if (<= (count s) (+ index (count matched-str)))
              next-parts
              (do
                (set! (.-lastIndex re)
                      (if (and (== 0 index)
                               (== 0 matched-str-len))
                        1 0))
                (recur (.substring s (+ index (count matched-str)))
                                    next-parts))))))))))
;=>  ["a" "b" "c"]

The string is correctly split but does not include the capturing groups.

by
Interestingly enough, the way it works after the patch is exactly the way it works in Clojure.
by
Adding (< 1 (count m)) (into (rest m)) to the next-parts cond-> adds the capturing groups
Welcome to Clojure Q&A, where you can ask questions and receive answers from members of the Clojure community.
...