Some queries on strings

→ Pay attention

Before contest
Pinely Round 4 (Div. 1 + Div. 2)
26:28:25
Register now »

*has extra registration

→ Streams

Atcoder ABC #364 Solution Discussion

By aryanc403

Before stream 01:38:23

Codeforces Pinely Round 4 (Div 1 + Div 2) Solution Discussion

By Shayan

Before stream 29:33:23

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	atcoder_official	148
5	-is-this-fft-	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

View all →

→ Find user

→ Recent actions

Detailed →

OneClickAC's blog

Some queries on strings

By OneClickAC, history, 5 years ago, In English

Hi folks, Got to know about this problem from a friend. It was asked in codenation's coding round/interview last year.

Any leads on a better approach than $$$O(n^2)$$$ will be appreciated.

Question — You are given n strings and q queries. Each query is also in the form of a string (say $$$q_i$$$). You have to reply with true if this string can be formed by concatenating any of the n strings given earlier.

For example — Suppose $$$n = 3$$$ and the strings are $$$abc$$$ , $$$bc$$$ and $$$cd$$$.

So $$$abcabc$$$ is a valid string because you can use $$$abc$$$ twice. (There is no constraint on the number of times you can use any given string.)

Also, $$$bccdabc$$$ is a valid string as we can use the given strings in the order $$$2$$$, $$$3$$$ and $$$1$$$.

Sum of all the strings in query is $$$\leq 10000$$$

My friend final_tsu helped me in coming with a $$$O(n^2)$$$ approach using Trie and DP.

Thanks.

OneClickAC
5 years ago
5

Comments (5)

Write comment?

Nson

5 years ago, # |

← Rev. 2 →

+34

Some weeks ago, tfg told me how to solve it online with time and space $$$O(N * sqrt(N))$$$.

It is possible to solve it online using dp + sqrt decomposition.

To answer a query of string $$$t$$$, we build dp[i] = "it is possible to concatenate some strings from your set to form the prefix with i characters". So it looks like

dp[i + j] = dp[i] if there is a match of a string(from your set) with size j starting at position i + 1

The answer for the query is dp[t.size()].

To build the dp we need some precomputation.

Let $$$S$$$ be the sum of sizes of all $$$n$$$ given strings. We build a trie with all strings with size < $$$sqrt(S)$$$ and build prefix function for all the others. The trie has height at most $$$sqrt(S)$$$ and there are at most $$$sqrt(S)$$$ prefix functions saved.

Traverse the trie starting with every position of $$$t$$$ and save all ranges $$$(l, r)$$$ of $$$t$$$ that matches some string on the trie. This works in $$$O(t.size() * sqrt(S))$$$.

Now for all big strings with size <= t.size() do the usual kmp matching and save ranges $$$(l, r)$$$ as before. This also works in $$$O(t.size() * sqrt(S))$$$.

You ended up with $$$O(t.size() * sqrt(S))$$$ ranges that you can use to compute the dp.

→ Reply

final_tsu

5 years ago, # ^ |

Thanks a lot for the clear explanation :)

→ Reply

ivan100sic

5 years ago, # ^ |

+24

Are you sure this uses linear memory? The same time complexity can be achieved by using Aho-Corasick — no need to explicitly divide strings into long and short ones. If the state of the Aho-Corasick automaton is $$$x$$$ after reaching the prefix $$$t_{[0,i)}$$$, the value of $$$dp_i$$$ can be found by following dictionary links from node $$$x$$$. Since depths of nodes on this dictionary path are strictly decreasing and they all correspond to strings from the set $$$S$$$, if the total length of the $$$n$$$ strings is $$$L$$$ then there are no more than $$$O(\sqrt L)$$$ of these dictionary nodes, so the algorithm runs in $$$O(L + |t|\sqrt L)$$$ time and uses linear memory.

→ Reply

Nson

5 years ago, # ^ |

$$$\textit{Are you sure this uses linear memory?}$$$

I was wrong. tfg's solution also divided the queries into long and short ones to achieve linear complexity. I thought this simplification kept linear but I forgot about the saved ranges.

Your solution with aho is really nice because it's just direct use of it.

→ Reply

tfg

5 years ago, # ^ |

← Rev. 3 →

Yes, that solution uses linear memory. The trie is linear memory, the borders precomputation uses linear memory, keeping the KMP states uses O(sqrt(L)) memory. You don't need to explicitly have a vector of occurences, just proccess them as they come (as in here but keeping the kmp states for each big pattern). I'm now aware of that Aho-Corasick solution ever since this problem. Your solution is offline though, but you can easily make it online by using the trick to maintain O(log) sets, it'll work in O(sqrt(L) + log(N)) per character if we assume no equal strings are given, and that's easily treatable.

Edit: online as in queries could be adding patterns.

→ Reply