Welcome! Please see the About page for a little more info on how this works.

+2 votes
in ClojureScript by
edited by

Hello,

I'm seeing this difference with re-seq in ClojureScript from Clojure:

;; CLJ
(re-seq #"^[a-f]" "aabcded") ;; => ("a")

;; CLJS
(re-seq #"^[a-f]" "aabcded") ;; => ("a" "a" "b" "c" "d" "e" "d")

Is this a bug?

ClojureScript version 1.10.520

(Logged as https://clojure.atlassian.net/browse/CLJS-3187)

4 Answers

0 votes
by
 
Best answer

Logged by Alex Miller as a bug here: https://clojure.atlassian.net/browse/CLJS-3187

+2 votes
by

This is a bug in the re-seq implementation in ClojureScript:

(defn re-seq
  "Returns a lazy sequence of successive matches of re in s."
  [re s]
  (let [match-data (re-find re s)
        match-idx (.search s re)
        match-str (if (coll? match-data) (first match-data) match-data)
        post-idx (+ match-idx (max 1 (count match-str)))
        post-match (subs s post-idx)]
    (when match-data (lazy-seq (cons match-data (when (<= post-idx (count s)) (re-seq re post-match)))))))

The issue comes in where it recurses into re-seq with the remainder of the string, doing this means that ^[a-f] will match again against this new, shorter, string.

One solution is to make your regex sticky:

(js/RegExp. #"^." "y")

This makes subsequent uses of your regex aware of previous matches, do note that you will need to make sure you place this code carefully as it will need to be created at the correct location, it can't be global! If it were global you would run into weird state issues like this one:

(let [re (js/RegExp. #"^." "y")]
  [(re-seq re "cccc")
   (re-seq re "abbb")])
;; => [("c" "c") nil]

(which I cannot explain at all!)

An alternative implementation of re-seq might make this initial clone for you:

(defn re-seq2
  "Returns a lazy sequence of successive matches of re in s."
  [re s]
  (let [re-seq* (fn re-seq* [re s]
                  (let [match-data (re-find re s)
                        match-idx (.search s re)
                        match-str (if (coll? match-data) (first match-data) match-data)
                        post-idx (+ match-idx (max 1 (count match-str)))
                        post-match (subs s post-idx)]
                    (when match-data (lazy-seq (cons match-data (when (<= post-idx (count s)) (re-seq* re post-match)))))))]
    (re-seq* (js/RegExp. re "y") s)))

(let [re #"^."]
  [(re-seq2 re "cccc")
   (re-seq2 re "abbb")])
;; => [("c") ("a")]
by
edited by
Kudos for finding a solution to the stated problem.

Unfortunately, it seems that stickiness breaks re-seq for parity with CLJ for other expressions:

;; CLJ
(re-seq #"[a-f]" "aabcded")
;; => ("a" "a" "b" "c" "d" "e" "d")

;; CLJS (sticky)
(re-seq (js/RegExp. #"[a-f]" "y") "aabcded")
;; => ("a" "b" "e")

;; CLJS (re-seq2)
(re-seq2 #"[a-f]" "aabcded")
;; => ("a" "b" "d" "d")
0 votes
by
edited by

FWIW, I ended up solving my current problem by re-implementing re-seq in the following manner:

(defn re-seq [re s]
  (let [re* (js/RegExp. re "g")
        xf (comp (take-while some?)
                 (map first))]
    (sequence xf (repeatedly #(.exec re* s)))))

Once lazy-seq is removed from the equation (and "global" is switched on for the regex), it works as expected for my test cases.

0 votes
by

I created a patch for the Jira ticket.
Solution was to add a global flag to regular expression if there wasn’t already and call repeatedly RegExp.prototype.exec() method until there are no more matches.
Please let me know if you find any issues.

...