Help finding sum of the number of distinct characters in all the distinct substrings of string S.

→ Pay attention

Before contest
Pinely Round 4 (Div. 1 + Div. 2)
21:59:15
Register now »

*has extra registration

→ Streams

Codeforces Pinely Round 4 (Div 1 + Div 2) Solution Discussion

By Shayan

Before stream 25:04:15

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	atcoder_official	148
5	-is-this-fft-	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

View all →

→ Find user

→ Recent actions

Detailed →

deepak1527's blog

Help finding sum of the number of distinct characters in all the distinct substrings of string S.

By deepak1527, history, 5 years ago, In English

How to solve the following problem. Find the sum of the number of distinct characters in all the distinct substrings of S. 1<=|S|<=100000. S="aabb"
Set of distinct sub-strings of "aabb" = {a,b,aa,ab,bb,aab,abb,aabb} sum = 1 + 1 + 1 + 2 + 1 + 2 + 2 + 2 = 12.

Thanks!

deepak1527
5 years ago
4

Comments (4)

Write comment?

aviroop123

5 years ago, # |

+28

Can you give the link to the problem so that I can verify that it's not a part of some ongoing contest?

→ Reply

deepak1527

5 years ago, # ^ |

No it's not a part of any ongoing contest, this problem was asked in coding round of a company and that round has ended.

→ Reply

m0nk3ydluffy

5 years ago, # |

+19

I would like to propose a solution. It would be nice if someone can verify it.

The main idea is that for each character, we have to figure out its contribution to the final sum i.e. for each character $$$S_{i}$$$ in our string, we have to find how many distinct substrings exist such that $$$S_{i}$$$ is the first occurrence of its kind in that substring? This can be answered using a Suffix Automaton. (If we didn't have to count distinct substrings, the problem would have an easier solution.)

So we will build a Suffix Automaton on the given string. Lets name the starting node of the automaton as $$$t_{0}$$$. Consider an edge from node $$$u$$$ to node $$$v$$$ using the character $$$c$$$. The contribution of $$$c$$$ to the final sum = $$$dp[u][c] * dp[v]$$$ where,

$$$dp[u][c]$$$ = The number of paths from $$$t_{0}$$$ to $$$u$$$ such that we never use an edge with character $$$c$$$

$$$dp[v]$$$ = The number of paths that begin from node $$$v$$$ and end at any other node.

The above two dp's can be calculated fairly easily. The final answer should be the sum of the contribution of each edge in the Suffix Automaton.

Since you mentioned that this question was asked in the coding round of a company, it should have an easier solution.

→ Reply

MSchallenkamp

5 years ago, # |

We can solve this with a suffix array by recognizing that each unique substring can be represented as a prefix of a suffix.

Let's construct a suffix array. Now we'll walk forward, starting with the first string in the suffix array. For the first string in the suffix array all of the prefixes of this suffix are valid. For the next string only the prefixes longer than the longest common prefix between it and the last string are unique. All the others were counted as part of some earlier suffix. This will take nlogn time to construct the suffix array, nlogn time to find the longest common prefix between every two suffixes, and finally n time to sum all the values for each suffix.

→ Reply