Number of Common Substrings

Please subscribe to the official Codeforces channel in Telegram via the link https://t.me/codeforces_official. ×

→ Pay attention

Before contest
Codeforces Round 961 (Div. 2)
02:11:54
Register now »

*has extra registration

→ Streams

Codeforces Round 960 Solution Discussion

By aryanc403

Before stream 04:16:53

Codeforces Round 961 (Div 2) - Official Solution Discussion

By Shayan

Before stream 04:17:53

CodeChef Starters 144 Solution Discussion

By aryanc403

Before stream 28:36:53

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	SecondThread	148
5	atcoder_official	148
8	Petr	147
9	nor	144
9	TheScrasse	144

View all →

→ Find user

→ Recent actions

Detailed →

Eramoni's blog

Number of Common Substrings

By Eramoni, 6 years ago, In English

I am stuck on this following problem:

Given two strings of size n and m (1<=n,m<=100000), find the number of common substrings between this strings.

It will be really helpful if you can give me a solution.

Eramoni
6 years ago
5

Comments (4)

Show archived | Write comment?

LordVoldebug

6 years ago, # |

← Rev. 5 →

It requires data structure that is called suffix array — array, which contains a permutation of numbers from 1 to n, where n is the length of a string. Number i in this array represents i-th suffix of a string, and suffixes are sorted lexicographicaly.

There is a nice way to build it in n log ^ 2 (there is also exists a way to build it in n log n, which is also quite easy (look here https://cp-algorithms.com/string/suffix-array.html), and even in O(n) (but algorithm in O(n) has relatively high constant factor, so it is not very practical).

So, lets build suffix array in O(n log ^ 2 n)!

Lets count polynomial hashes of a string (if you are not really familiar with polynomial hashes of a string, look here: https://cp-algorithms.com/string/string-hashing.html). So now we can check, whether two substrings of our string are equal in O(1) time!

Lets notice that we can compare lexicographicaly two suffixes in O(log n) time — lets use binary search and find longest common prefix of two suffixes. We can do it in O(log n) time. And then we can just compare the first letter that differs in those two suffixes, and we are done! So, we can just run std::sort on the permutation of numbers from 1 to n using this comporator and so we will get a suffix array of a string in O(n log n) (sorting) * O (log n) (each comparison works in O(log n))).

To solve our original problem let`s build the suffix array and count lcp[i] — longest common prefix of neigboring suffixes in the suffix array. (We can calculate lcp in O(n log n) using that binary search with hashing) (it is also possible to do this in O(n), which is describes in the article above). Than, lets sum the lcp values substruct this value from (n ^ 2 + n) / 2 and we are done! It is exactly the number of different substrings of a strings. For some proof look here: https://cp-algorithms.com/string/suffix-array.html#toc-tgt-10

→ Reply

Eramoni

6 years ago, # ^ |

Actually, I know how to find the number of different substrings in a string. But my problem is not that. I want to know "how many common substrings are there between two different strings"

→ Reply

LordVoldebug

6 years ago, # ^ |

← Rev. 2 →

Oh, sorry. I somehow misread the problem.

Well, let`s build suffix arrays of both strings, s and t. (array for string s#t where '#' in neither in s, neither in t and smaller than any other symbol).

It is a common knowledge is that every substring is a prefix of some suffix. Let`s iterate over the suffix of the string s which we choose. Fot or this string, we can find out longest prefix of the suffix that also occurs in t — simply look at the nearest left position in merged suffix array which belongs to string t and nearest right position in the merged suffix array which belongs to string t(via some simple dp for instance). So, now we know for each suffix of s longest prefix of it that belongs to t. So now we can iterate over suffix array and add max(longest_t[i] — lcp[i], 0) for ecach suffix of s to the answer (lcp[i] — lcp of i and i — 1 suffix in this case. (idea is similar to the one in my first comment).

→ Reply

nopangoliers

6 years ago, # ^ |

It seems you know all to solve your problem. Let A — set of different strings of a (first given string). Let B — set of different strings of b (second given string).

Then $\text{[math]}$ . You know how to calculate |A| and |B|, and you want to find $\text{[math]}$ . Think a bit, how to obtain $\text{[math]}$ .

solution

→ Reply