Welcome! Please see the About page for a little more info on how this works.

0 votes
ago in ClojureScript by
edited ago by

This seems like a bug to me? Using limit in clojure.string/split (in Clojurescript) gives the wrong result on a simple split.

;; NB Clojurescript
(clojure.string/split "a|b" #"|")
;=> ["a" "|" "b"]
(clojure.string/split "a|b" #"|" 3)
;=> ["" "" "a|b"]

Results should be the same?

Corresponding code in Javascript

"a|b".split(/|/)
;=> ['a', '|', 'b']
"a|b".split(/|/, 3)
;=> ['a', '|', 'b']
ago by
There's a similar https://clojure.atlassian.net/browse/CLJS-2528 but it's not the same as this reproduction case doesn't use lookaheads or -behinds.
ago by
Reported to #cljs-dev.
ago by
Just to clarify the bug report, I was trying to split on `|`. This does however require escaping, `#"\|"`. which I wasn't aware of.  So the observed difference in behavior when using `#"|"` may not be meaningful - but the additional example by Eugene does show a difference in behavior compared to Clojure.

2 Answers

0 votes
ago by

Just to add to the report - the problem also arises when #"" is used.

In Clojure:

user=> (clojure.string/split "abc" #"")
["a" "b" "c"]
user=> (clojure.string/split "abc" #"" 3)
["a" "b" "c"]

In ClojureScript:

cljs.user=> (clojure.string/split "abc" #"")
["" "a" "b" "c"]
cljs.user=> (clojure.string/split "abc" #"" 3)
["" "a" "bc"]
0 votes
ago by
edited ago by

I have done some more digging and found other problems. The regex #"&([^;\s<&]+);?" should (a) work and (b) include capturing groups in the result.

As far as I understand, this regex does not include lookaheads or -behinds.

(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?")
;=> ["a" "amp" "b" "amp" "c"]
(string/split "a&amp;b&amp;c" #"&([^;\s<&]+);?" 5)
;=> ["" "" "" "" "p;b&amp;c"]

The https://clojure.atlassian.net/browse/CLJS-2528 patch no 5 does not resolve this problem, as far as I've been able to implement it.

(let [limit 100
    re #"&([^;\s<&]+);?"
    s "a&amp;b&amp;c"]
(let [limit (dec limit)
      re (js/RegExp. (.-source re)
                     (cond-> "g"
                             (.-ignoreCase re) (str "i")
                             (.-multiline re) (str "m")
                             (.-unicode re) (str "u")))]
  (loop [s s, parts []]
    (if (or (<= limit (count parts))
            (string/blank? s))
      (conj parts s)
      (let [m (.exec re s)]
        (if (nil? m)
          (conj parts s)
          (let [_ (println m)
                index (.-index m)
                matched-str (aget m 0)
                matched-str-len (count matched-str)
                next-parts (cond-> parts
                                   (and (or (< 0 index)
                                            (< 0 matched-str-len))
                                        (not= s matched-str))
                                   (conj (.substring s 0 index)))]
            (if (<= (count s) (+ index (count matched-str)))
              next-parts
              (do
                (set! (.-lastIndex re)
                      (if (and (== 0 index)
                               (== 0 matched-str-len))
                        1 0))
                (recur (.substring s (+ index (count matched-str)))
                                    next-parts))))))))))
;=>  ["a" "b" "c"]

The string is correctly split but does not include the capturing groups.

ago by
Interestingly enough, the way it works after the patch is exactly the way it works in Clojure.
ago by
Adding (< 1 (count m)) (into (rest m)) to the next-parts cond-> adds the capturing groups
Welcome to Clojure Q&A, where you can ask questions and receive answers from members of the Clojure community.
...