Giaco's blog

By Giaco, history, 14 months ago, In English

Hello everyone!

I'm looking for the time complexity of the builder of c++ regex object. From a fast web search, i didn't find an answer (at least from the top 3 google search results lol). The stackoverflow's answer does not give an isnight of what it's happening in the constructor.

If you wonder why i'm looking for this, here are my two submission for the problem 1800A - Is It a Cat?.

  • Vote: I like it
  • +23
  • Vote: I do not like it

»
14 months ago, # |
Rev. 2   Vote: I like it +27 Vote: I do not like it

C++ regex is too slow in some cases to be usable practically. Constructing a regex object can be very slow due to possibly having exponentially many nodes in the resulting automaton (which in turn depends on the choice of regex you're using: ECMAScript, basic POSIX, and so on; and the type of matches you want can potentially make the language non-regular). The bottom line: don't use C++ regex.

A great resource on a simpler variant of the underlying machinery is here and shows why you should prefer using NFA based implementations over DFA based ones. In particular, you can use a dp to get a linear time (assuming the automaton size and alphabet size are constant) matching algorithm on a linear size automaton (in the size of the expression).

A substring matching algorithm for regular languages can be found in the next post, which is here.

Fun fact: I use this example a lot whenever someone comes up to me and asks me why theory is important, and why they should care about complexity, when obviously computers are getting faster (they're not, even on a hardware scale, let alone with the kind of code software developers write).