Monster_Nerd's blog

By Monster_Nerd, history, 3 years ago, In English

Recently i encountered TLE on this problem E. Tree-String Problem
using decent dfs + kmp. Later i came to know that there is an optimized version of kmp. I listed the code below which i had taken from Arpa 's code.

int k = 0;
    for(int i = 1; i < p.size(); i++){
	while(k && p[i] != p[k]) k = f[k];
	if(p[i] == p[k])  k++;
	f[i + 1] = k;
    }

for(int i = 0; i < p.size(); i++)
	for(int j = 0; j < z; j++)
	    nxt[i][j] = p[i] - 'a' == j ? i + 1 : bool(i) * nxt[ f[i] ][j];

now my question is, can i use it always [if memory limit is ok] ? Does it consistent and can anyone just briefly explain what does it do?

  • Vote: I like it
  • +23
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Monster_Nerd (previous revision, new revision, compare).

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Monster_Nerd (previous revision, new revision, compare).

»
3 years ago, # |
  Vote: I like it +4 Vote: I do not like it

Yes, it's consistent. Simplify the code, it's easy to understand.

»
3 years ago, # |
Rev. 4   Vote: I like it 0 Vote: I do not like it

Your original code TLEs on this problem because its worst-case runtime is $$$\Omega(|t| + n \cdot \min (|t|, \sum |s_v|))$$$. The failing testcase is probably a larger version of the following:

10
1 aaaaaaaaaa
2 x
2 x
2 x
2 x
2 x
2 x
2 x
2 x
aaaaaaaaaay

Your original code performs the backtracking part of KMP at match-time. When parsing a single string, this doesn't affect asymptotic complexity and doesn't have a big impact on the constant factor because each character added to a partial match is removed at most once. But in the tree setting, the partial match characters may be removed $$$n-2$$$ times in a test case like the one above. Arpa's version of KMP gets around this by preprocessing all of the potential backtracking steps so that they may be performed in (un-amortized) $$$O(1)$$$ at match time, restoring a more reasonable $$$O(n + |\sigma | |t| + \sum |s_v|)$$$ time complexity.

EDIT: Complexities were rewritten a bit more precisely.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks a lot buddy. I had tried a lot to figure out the test case and was quite dumb to understand what's wrong going on. now i completely got it. Thanks, and at last i understood why optimization was required. Thanks a lot.