Why is the Aho-Corasick Automaton necessary when Suffix Arrays exist?

Revision en1, by Anthony_Smith, 2022-02-25 08:04:57

An Aho-Corasick Automaton can solve the problem, "Given some text T and a set of patterns P, find every occurrence (every index) of every pattern in T", in O(total number of occurrences + |T| + total length of all patterns) time. My question is, why is the Aho-Corasick Automaton necessary when we can solve this problem with Suffix Arrays too? Are suffix arrays too slow in competition problems? Or am I simply misunderstanding the inner workings of these concepts?

Valid (I think) method using Suffix Arrays: Construct a suffix array over T in O(|T|log|T|) time. Then, for each pattern, we can find all its occurrences in O(pattern size * log|T| + number of occurrences) time by binary searching through the suffix array (See CSES Finding Patterns), then looping through all valid suffices and collecting their starting indices (the indices where the pattern occurs). So, the final time complexity would just be O(|T|log|T| + total pattern length * log|T| + total number of occurrences). This is pretty much the Aho-Corasick Automaton's time complexity multiplied by log|T|, which doesn't seem all that much of a big difference to me.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en1 English Anthony_Smith 2022-02-25 08:04:57 1213 Initial revision (published)