what is the best way to store aho corasick output link? - Codeforces

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
2 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
2 days
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

err0r's blog

what is the best way to store aho corasick output link?

By err0r, history, 6 years ago, In English

In English

Recently i learnt Aho-Corasick Algorithm for Pattern Searching. To find all occurrences of pattern strings in text string. i store all possible output links in my prefix tree(trie)

But case such as
text = aaaaa
pattern = {a, aa, aaa, aaaa, aaaaa}
will give me huge number of output link!

my implementation is there a better way to store them?

Tags

#strings

+8

err0r
6 years ago
3

Comments

Comments (3)

Write comment?

»

6 years ago, # |

Vote: I like it

0

Vote: I do not like it

Auto comment: topic has been updated by err0r (previous revision, new revision, compare).

→ Reply

»

6 years ago, # |

Vote: I like it

+3

Vote: I do not like it

Store in a node only the indices of the pattern, that match exactly (the letters of the path from the root to this node spell exactly the pattern).

In this way you only store at most one pattern for each node (except if there are duplicate patterns).

All other pattern that match in a certain node, are suffixes of the exact matching pattern. Therefore if you want to find all matches that end in a certain node, you can just walk along the suffix links and print all the stored indices that you find along these nodes.

Notice, that this can be quite inefficient. E.g. if you have the patterns {a, aaaaaaaaaaaa}. There are 13 nodes in the trie, and to find all matches that end at the end of the text aaaaaaaaaaaa you would have to visit all 13 nodes, although there are only 2 matches.

One solution. You can precompute a second suffix link for each node, which points to the next terminal node (if there exists such a node) on the suffix link path. Then you can just print the (possible empty) output vector, go to the next terminal node via the link, print the (guaranteed non-empty) output vector, etc. This uses only O(|nodes|) more memory, O(|nodes|) more precomputation time, and finding all matches ending in a certain node is still O(|matches|).

You can see it in my implementaton. Notice that I assume that there are no duplicate patterns, so it only stores one possible pattern index for each node.

→ Reply

»

»

6 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

thank you *_*

→ Reply