regex equality

Question

regex equality

asked Sep 7, 2019 in Compiler by Mikhail Kuzmin

How to check regexp equality in nested structures?

(= #"." #".") ;; #=> false
(= [#"."] [#"."]) ;; #=> false
(= ["."] ["."]) ;; #=> true

I believe that a regexp is a value object.
So two regexp are equal if they equals literally.

Is it possible to update = function for regexp equality support?

Maybe like this:

(defprotocol IEquals
 (equals [a b]))

(extend-protocol IEquals
  Object
  (equals [a b] (.equals a b))
  
  Pattern
  (equals [a b]
    (and
      (instance? Pattern b)
      (= (str a) (str b)))

And clojure.lang.Util.equiv uses
IEquals#equals instead of Object#equals.
But I down know how to call protocol method from clojure.lang.Util.

1 Answer

alexmiller · Answer 1 · 2019-09-07T14:58:40+0000

commented Sep 7, 2019 by Sean Corfield

commented Sep 18, 2019 by Tamas Herman

Converting "at the point of use" is also a performance hit, since `re-pattern` does regex compilation, which should only happen once, especially when reusing a regex.

I'm current use case is to categorise a stream of string via a curated list of ~18000 regex-to-category mappings, which doesn't change frequently.

It's very much desirable to keep those 18000 regexes compiled, otherwise doing a `re-find` on them against ~6000 strings takes 21s instead of 9.5s. If I `(memoize re-pattern)`, it takes 26s.

Not having regex equality, at least based on the string they were created from, just makes writing tests quite painful too.

On one hand regexes feel like a 1st class citizen of Clojure, since they even have their built-in literal syntax, can be used as hash-map keys, but then they brake down when it comes to equality.

If a `java.util.regex.Pattern` is compiled from the same string, the resulting object will behave the same way, so it's safe to be considered equal.

I fail to see how is it relevant that different regex strings might result in the same matching behaviour. We are talking about the `=` operator, not a `does-it-behave-the-same?` operator... :/

What would be the use-case for detecting whether a regex object instance is the same as another one?

commented Sep 18, 2019 by alexmiller

regex equality

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Categories

regex equality

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories