This is the first post in my Codeforces blog! I plan to use this as a place where I can write down new algorithms I learned, problems that I found interesting, or stupid mistakes I made during contests.

Today, the topic will be the Z Algorithm, which I learned as a result of failing to solve Problem B of Beta Round 93 (http://codeforces.com/contest/126/problem/B). There are some other solutions like binary search + hashing, but this one is quite nice. Anyway, first, a description of the algorithm and why it works; it is simple and makes a lot of sense (as all good algorithms are).

### Algorithm

Given a string

*S*of length*n*, the Z Algorithm produces an array*Z*where*Z*[*i*] is the length of the longest substring starting from*S*[*i*] which is also a prefix of*S*, i.e. the maximum*k*such that*S*[*j*] =*S*[*i*+*j*] for all 0 ≤*j*<*k*. Note that*Z*[*i*] = 0 means that*S*[0] ≠*S*[*i*]. For easier terminology, we will refer to substrings which are also a prefix as prefix-substrings.The algorithm relies on a single, crucial invariant. As we iterate over the letters in the string (index

*i*from 1 to*n*- 1), we maintain an interval [*L*,*R*] which is the interval with maximum*R*such that 1 ≤*L*≤*i*≤*R*and*S*[*L*...*R*] is a prefix-substring (if no such interval exists, just let*L*=*R*= - 1). For*i*= 1, we can simply compute*L*and*R*by comparing*S*[0...] to*S*[1...]. Moreover, we also get*Z*[1] during this.Now suppose we have the correct interval [

*L*,*R*] for*i*- 1 and all of the*Z*values up to*i*- 1. We will compute*Z*[*i*] and the new [*L*,*R*] by the following steps:- If
*i*>*R*, then there does not exist a prefix-substring of*S*that starts before*i*and ends at or after*i*. If such a substring existed, [*L*,*R*] would have been the interval for that substring rather than its current value. Thus we "reset" and compute a new [*L*,*R*] by comparing*S*[0...] to*S*[*i*...] and get*Z*[*i*] at the same time (*Z*[*i*] =*R*-*L*+ 1). - Otherwise,
*i*≤*R*, so the current [*L*,*R*] extends at least to*i*. Let*k*=*i*-*L*. We know that*Z*[*i*] ≥*min*(*Z*[*k*],*R*-*i*+ 1) because*S*[*i*...] matches*S*[*k*...] for at least*R*-*i*+ 1 characters (they are in the [*L*,*R*] interval which we know to be a prefix-substring). Now we have a few more cases to consider. - If
*Z*[*k*] <*R*-*i*+ 1, then there is no longer prefix-substring starting at*S*[*i*] (or else*Z*[*k*] would be larger), meaning*Z*[*i*] =*Z*[*k*] and [*L*,*R*] stays the same. The latter is true because [*L*,*R*] only changes if there is a prefix-substring starting at*S*[*i*] that extends beyond*R*, which we know is not the case here. - If
*Z*[*k*] ≥*R*-*i*+ 1, then it is possible for*S*[*i*...] to match*S*[0...] for more than*R*-*i*+ 1 characters (i.e. past position*R*). Thus we need to update [*L*,*R*] by setting*L*=*i*and matching from*S*[*R*+ 1] forward to obtain the new*R*. Again, we get*Z*[*i*] during this.

The process computes all of the

*Z*values in a single pass over the string, so we're done. Correctness is inherent in the algorithm and is pretty intuitively clear.### Analysis

We claim that the algorithm runs in

*O*(*n*) time, and the argument is straightforward. We never compare characters at positions less than*R*, and every time we match a character*R*increases by one, so there are at most*n*comparisons there. Lastly, we can only mismatch once for each*i*(it causes*R*to stop increasing), so that's another at most*n*comparisons, giving*O*(*n*) total.### Code

Simple and short. Note that the optimization

*L*=*R*=*i*is used when*S*[0] ≠*S*[*i*] (it doesn't affect the algorithm since at the next iteration*i*>*R*regardless).int L = 0, R = 0; for (int i = 1; i < n; i++) { if (i > R) { L = R = i; while (R < n && s[R-L] == s[R]) R++; z[i] = R-L; R--; } else { int k = i-L; if (z[k] < R-i+1) z[i] = z[k]; else { L = i; while (R < n && s[R-L] == s[R]) R++; z[i] = R-L; R--; } } }

### Application

One application of the Z Algorithm is for the standard string matching problem of finding matches for a pattern

*T*of length*m*in a string*S*of length*n*. We can do this in*O*(*n*+*m*) time by using the Z Algorithm on the string*T*Φ*S*(that is, concatenating*T*, Φ, and*S*) where Φ is a character that matches nothing. The indices*i*with*Z*[*i*] =*m*correspond to matches of*T*in*S*.Lastly, to solve Problem B of Beta Round 93, we simply compute

*Z*for the given string*S*, then iterate from*i*to*n*- 1. If*Z*[*i*] =*n*-*i*then we know the suffix from*S*[*i*] is a prefix, and if the largest*Z*value we've seen so far is at least*n*-*i*, then we know some string inside also matches that prefix. That gives the result.int maxz = 0, res = 0; for (int i = 1; i < n; i++) { if (z[i] == n-i && maxz >= n-i) { res = n-i; break; } maxz = max(maxz, z[i]); }

Here I post some tutorial link.

If you know any new tutorial blog post please comment , I will add them .

Thanks paladin8 for his awesome tutorial.

Here also by using Z-algorithm 6780044

100% agree with u . it would be a great source for everybody to learn algorithm :)

That site is awesome, if someone could translate it to english, i think that it would be the best algorithm reference ever.

It automatically translates to English for my region! Great!

Here is the english version of this site: http://e-maxx-eng.github.io

Also the pdf ebook containing all the tutorials and algo in English can be downloaded from: https://www.dropbox.com/s/j8z839pzamxxqw2/e-maxx_algo.ru.en.pdf?dl=1

thanx for sharing this source with us :)

Non Russians can use this site to read the tutorials in English: http://e-maxx-eng.github.io

Or if one visit that site using google chrome, then s/he can automaticaly translate in English with the help of google chrome browser.

Also I have a pdf file in English containing all posts on this site. This pdf ebook containing all the tutorials and algo in English can be downloaded from: https://www.dropbox.com/s/j8z839pzamxxqw2/e-maxx_algo.ru.en.pdf?dl=1

import java.io.*;

import java.util.*;

public class D {

public static void main(String[] args) throws IOException{

BufferedReader in = new BufferedReader(new InputStreamReader(System.in));

String s = in.readLine();

char[] c = s.toCharArray();

int n = c.length;

int[] memo = new int[n];

/* memo[i] will store the length of the longest prefix of s that matches the tail of s1...si */

for (int i=1; i<n; i++) {

int j=i;

while (j>0 && c[i] != c[memo[j-1]]) j = memo[j-1];

memo[i] = (j>0) ? memo[j-1]+1 : 0;

}

int max = 0;

for (int i=1; i<n-1; i++) max = Math.max(max, memo[i]);

/* max = Maximum internal match to prefix */

j = memo[n-1];

while (j>max) j = memo[j-1];

System.out.println((j==0)? "Just a legend" : s.substring(0, j));

}

}

may be this will work

`for i = 1 to n`

`p[i + z[i]] = max(p[i + z[i]], z[i])`

and one more travesing from n to 1. doing

`p[i] = max(p[i+1] - 1, p[i])`

"and one more travesing from n to 1. doing p[i] = max(p[i+1] — 1, p[i])"

why is this necessary ?

A portion of necroposting.

You can solve 2 problems: 1) Create a sample string with given Z-function over an infinite alphabet. Just follow Z-function generation algorithm and store all the equalities in DST. Next, check the result. 2) Create a sample string with given prefix-function over an infinite alphabet. Just follow prefix function generation algorithm and store all the equalities in DST. Next, check the result.

(in Russian: http://contest.samara.ru/ru/problemset/735/)

I dont know if your Z<-> prefix transformation is mutual. I've written a post to try to construct z from kmp, http://liruqi.info/post/62511983824/kmp-z (In Chinese). Beside, you may check my code here, https://raw.github.com/liruqi/topcoder/master/HackerRank/SaveHumanity.cpp

Can someone give both transformation implementations? If possible with some explanations. Thanking you in advance. Learn and let learn.

No advantages =)

But if we compare Z-function and Prefix-function then:

1)"monotony"If z[i] = x, it means substrings [0..j) [i..i+j) are equals for all j from 0 to x.

If p[i] = x, it means [0..j) = (i-j..i] for all j = x, p[x], p[p[x]] and so on.

2)LCP (Largest Common Prefix)Z-function in fact calculates LCP[0,j] for all j. It can be used for not only substring searching.

I also have two examples of problems which, I hope, show advantages Z-function over Prefix-function.

1)Determine number (No.) of the string in its suffix array in O(n).2)Determine number (amount) of substrings, which have at least two occuarances in the string inO(n^{2}) (of course, you can solve it in O(n) using suffix tree).Your comment is valuable. But you have some error, I will show them, to prevent others from misunderstanding. (Someone reopen the post, so I read your comment)

1) "monotony"

If z[i] = x, it means substrings

[0..j) [i..i+j)are equals for all j from 0 to x-1.If p[i] = x, it means

[0..j) = (i-j..i]for all j = x, p[x], p[p[x]] and so on.Thanks. Fixed.

KMP requires only the smaller string to be known to allow for computing the prefix function. It is a stream based algorithm,you can work even if characters are given you one by one for the main string. This may be an advantage in certain scenarios.

Very good algorithm

Hi, I think that assumption is wrong, take this test case:

Is it right or I misunderstood what he said? :o

(sorry for editing, I am new to Markdown :p)

Amazing tutorial. Thanks to you and e-maxx I have learned a new algorithm today :D

Thanks for this amazing tutorial. Keep writing! :)

Very nice explanation of the algorithm , thanks !

There seems to be one more condition missing in the routine, actually not missing but Algo is doing more work for that case. So, there can be three cases when I < R and there are as follows

1) Z[k] < R-I+1 in this case Z[I] = Z[k] //Z[k] Length will be strictly less than R-I+1 2) Z[K] > R-I+1 in this case Z[I] = R-I+1 //This is the missing case. 3) Z[k] = R-I+1 in this case Z[I] = R-I +1 + [keep computing the Z values starting from the position R] // This is because we know Z[K] is strictly = R-I+1 (Beta) but Z[I] can still match with Patterns next character thus start computing the Z values for this Z[I] starting position from R until the mismatch is found.

One of my favorite posts. Quick question: the algorithm seems to ignore

`z[0]`

; shouldn't it be the case that`z[0] = n`

? Thanks.By the algorithm we consider that Z[0] is underfined

you can add it end of your code, if necessary :)

So, an algorithm is s type of code?

ghabool dari nemikhaym z[0] ?

can you please elaborate on what S[i] stores .

`S`

is the string.`S[i]`

is the symbol at position`i`

. If`S = "abcd"`

, then`S[0] = 'a'`

,`S[1] = 'b'`

and so on.So where do we Type out an algorithm? C++?

Is it possible to use this algorithm to solve UVA 10298 problem. If yes, would you please describe the process?

Just a silly optimization. In the line

`while (R < n && s[R-L] == s[R]) R++;`

, isn't the`R < n`

redundant for strings? Because`s[R-L] == s[R]`

will be false when we hit the`\0`

character. This could also work if we are using a sequence of numbers instead of strings (we have to add a number at the end that is different than all others).Not everyone uses C++ char arrays to store strings

that's the point. I'm saying that you can use sequences of any type provided that you insert a sentinel element at the end of your sequence. Then you can save the

`R < n`

.afair, std::string allows s[s.size()] and returns 0

it's guaranteed only for C++11 or later

Guys, this link at youtube might come in handy for a deeper insight and intuition into the algorithm. https://www.youtube.com/watch?v=MFK0WYeVEag

Thank you for this awesome tutorial.

this link contains animation for Z-Algorithm, it may be helpful.

awesome animation...... Thank you eagle93

Thanks for the link! :)

can any one tell me how to solve Problem B of Beta Round 93 with KMP by building Longest common prefix/suffix array :D

my code for make z function :)

If you want a much more detailed explanation with examples and complexity analysis, this link will be quite helpful!

Now we are computing the value at

`i = k`

. The values L and R are as shown in the image. Now in the case`Z[k] ≥ R - i + 1`

, since`r + 1`

and`r' + 1`

values are not same(if they were to be same, the interval would have been`L`

and`R + 1`

instead of the current values), why do we need to check the values after`r + 1`

?Helpful.

Can anyone answer my question above?

Can the longest sub-string and prefix overlap?

The O(n) complexity made me interested. Such a beautiful algorithm. Thank you :)

this is the easiest tutorial on Z-algorithm i found on the internet.

Does anyone have more problems related to Z Algorithm?I am appreciate if anyone could offer some.

try this one

another one

shortest implementation of Z Algorithm:

This is the first time i read a blog on codeforces hope i 'll continue on.

useful video

Z algorithm part 1: Linear-time pattern matching. Z-values and Z-algorithmZ algorithm part 2: Finish of Linear-time pattern matching