I have done some more digging and found other problems. The regex #"&([^;\s<&]+);?"
should (a) work and (b) include capturing groups in the result.
As far as I understand, this regex does not include lookaheads or -behinds.
(string/split "a&b&c" #"&([^;\s<&]+);?")
;=> ["a" "amp" "b" "amp" "c"]
(string/split "a&b&c" #"&([^;\s<&]+);?" 5)
;=> ["" "" "" "" "p;b&c"]
The https://clojure.atlassian.net/browse/CLJS-2528 patch no 5 does not resolve this problem, as far as I've been able to implement it.
(let [limit 100
re #"&([^;\s<&]+);?"
s "a&b&c"]
(let [limit (dec limit)
re (js/RegExp. (.-source re)
(cond-> "g"
(.-ignoreCase re) (str "i")
(.-multiline re) (str "m")
(.-unicode re) (str "u")))]
(loop [s s, parts []]
(if (or (<= limit (count parts))
(string/blank? s))
(conj parts s)
(let [m (.exec re s)]
(if (nil? m)
(conj parts s)
(let [_ (println m)
index (.-index m)
matched-str (aget m 0)
matched-str-len (count matched-str)
next-parts (cond-> parts
(and (or (< 0 index)
(< 0 matched-str-len))
(not= s matched-str))
(conj (.substring s 0 index)))]
(if (<= (count s) (+ index (count matched-str)))
next-parts
(do
(set! (.-lastIndex re)
(if (and (== 0 index)
(== 0 matched-str-len))
1 0))
(recur (.substring s (+ index (count matched-str)))
next-parts))))))))))
;=> ["a" "b" "c"]
The string is correctly split but does not include the capturing groups.