### OneClickAC's blog

By OneClickAC, history, 4 years ago,

Any leads on a better approach than $O(n^2)$ will be appreciated.

Question — You are given n strings and q queries. Each query is also in the form of a string (say $q_i$). You have to reply with true if this string can be formed by concatenating any of the n strings given earlier.

For example — Suppose $n = 3$ and the strings are $abc$ , $bc$ and $cd$.

So $abcabc$ is a valid string because you can use $abc$ twice. (There is no constraint on the number of times you can use any given string.)

Also, $bccdabc$ is a valid string as we can use the given strings in the order $2$, $3$ and $1$.

Sum of all the strings in query is $\leq 10000$

My friend final_tsu helped me in coming with a $O(n^2)$ approach using Trie and DP.

Thanks.

• +45

 » 4 years ago, # | ← Rev. 2 →   +34 Some weeks ago, tfg told me how to solve it online with time and space $O(N * sqrt(N))$.It is possible to solve it online using dp + sqrt decomposition.To answer a query of string $t$, we build dp[i] = "it is possible to concatenate some strings from your set to form the prefix with i characters". So it looks likedp[i + j] = dp[i] if there is a match of a string(from your set) with size j starting at position i + 1The answer for the query is dp[t.size()].To build the dp we need some precomputation.Let $S$ be the sum of sizes of all $n$ given strings. We build a trie with all strings with size < $sqrt(S)$ and build prefix function for all the others. The trie has height at most $sqrt(S)$ and there are at most $sqrt(S)$ prefix functions saved.Traverse the trie starting with every position of $t$ and save all ranges $(l, r)$ of $t$ that matches some string on the trie. This works in $O(t.size() * sqrt(S))$.Now for all big strings with size <= t.size() do the usual kmp matching and save ranges $(l, r)$ as before. This also works in $O(t.size() * sqrt(S))$.You ended up with $O(t.size() * sqrt(S))$ ranges that you can use to compute the dp.
•  » » 4 years ago, # ^ |   0 Thanks a lot for the clear explanation :)
•  » » 4 years ago, # ^ |   +24 Are you sure this uses linear memory? The same time complexity can be achieved by using Aho-Corasick — no need to explicitly divide strings into long and short ones. If the state of the Aho-Corasick automaton is $x$ after reaching the prefix $t_{[0,i)}$, the value of $dp_i$ can be found by following dictionary links from node $x$. Since depths of nodes on this dictionary path are strictly decreasing and they all correspond to strings from the set $S$, if the total length of the $n$ strings is $L$ then there are no more than $O(\sqrt L)$ of these dictionary nodes, so the algorithm runs in $O(L + |t|\sqrt L)$ time and uses linear memory.
•  » » » 4 years ago, # ^ |   +3 $\textit{Are you sure this uses linear memory?}$I was wrong. tfg's solution also divided the queries into long and short ones to achieve linear complexity. I thought this simplification kept linear but I forgot about the saved ranges.Your solution with aho is really nice because it's just direct use of it.
•  » » » 4 years ago, # ^ | ← Rev. 3 →   +3 Yes, that solution uses linear memory. The trie is linear memory, the borders precomputation uses linear memory, keeping the KMP states uses O(sqrt(L)) memory. You don't need to explicitly have a vector of occurences, just proccess them as they come (as in here but keeping the kmp states for each big pattern). I'm now aware of that Aho-Corasick solution ever since this problem. Your solution is offline though, but you can easily make it online by using the trick to maintain O(log) sets, it'll work in O(sqrt(L) + log(N)) per character if we assume no equal strings are given, and that's easily treatable.Edit: online as in queries could be adding patterns.