javacoder1's blog

By javacoder1, history, 8 years ago, In English

Hi, i was trying to solve the problem http://www.spoj.com/problems/FINDSR/

This problem is the application of failure function of KMP. The function block given below (i got in one of the blogs) aims to calculate the failure function:

void table(string p){

lld length = p.size();
v[0]=0;  
v[1]=0;  
lld cur=0;
for(lld j=2;j<length;j++){
    while(cur!=0 && p[cur]!=p[j-1])
        cur=v[cur];

    if(p[cur]==p[j-1])
        cur=cur+1;

    v[j]=cur;

}

lld length_substring = length - v[length-1]-1; // i am caught in this statement

}

// v[length-1] holds the the length of the longest proper prefix which matches the longest proper suffix from [0,length-2] for example for: abcabcabcabc v[length(12)-1]=v[11]=8; which is visible in the substring (abcabcabcab) length of the longest proper prefix which matches the longest proper suffix is of length 8(abcabcab)

problem is about the intuition of calculating the length of the repeating substring lld length_substring = length — v[length-1]-1;//formula simplifies to length_substring = 12-8-1=3; which clearly is the length of(abc)

Can someone explain me the intuition behind the formula of getting the substring as well as suggest some problems on codeforces to strengthen the concepts. Thanks in advance.

  • Vote: I like it
  • 0
  • Vote: I do not like it

»
8 years ago, # |
Rev. 7   Vote: I like it 0 Vote: I do not like it

Yes, KMP does calculate length of the longest proper prefix, that is the suffix of the each substring also.
Now, let us say that the input string is a1a2a3a4.
Where, ai represent shortest substring that is repeated. e.g., in case of S = "abcabcabcabc" ( "abc" repeats 4 times and abcabc repeats 2 times)
As our goal is to find maximum time repeating substring(obviously that is the smallest repeating substring).
That is what being computed by the KMP failure function.

abcabcabcabc
---abcabcabc
Crucial observation is that, This can be only possible, when a1 = a2, a2 = a3, a3 = a4.
So, that is the length of least possible number of shifts, len(a1).
shifted len = total len — proper prefix len(position+1).
Now, work your own where the position of shift is 1 or 0.(that is either first character is the proper prefix of the string(not substring) or no character matches to last character of string)
Let me know if anything is not clear.

»
8 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

I find that implementation of KMP's failure function to be very cumbersome. Using the same naming convention, I'd code it like this...

int cur = 0;
v[0] = -1, v[1] = 0;  

for(int j=2; j <= p.size(); j++)
{
    while(cur >= 0 && p[cur] != p[j-1]) cur = v[cur];
    v[j] = ++cur;
}