I have 2 questions in mind, just out of curiousity :). They both consider tries (not compressed). Also note — by a trie's size I'm not considering its root (even though, it does not matter for the purpose of this problem).

1) For some 2 natural numbers *N* and *K* (suppose *K* ≤ *N*), what is the maximal size of a trie that can be achieved if you take some string *S* of length *N*, with the size of its language (number of distinct characters it can contain) being *K*, and insert into the trie all its suffixes (so, a suffix trie)?

For instance, if *N* = 2, *K* = 2, we can form the string "ab", which makes the trie of size 3. On the other hand, if *N* = 10, *K* = 1, the only string we can form is "aaaaaaaaaa" which makes the trie size 10.

2) How do you form such a string with maximal trie size? An algorithm or an idea, both are acceptable.

Thanks.

Auto comment: topic has been updated by Noam527 (previous revision, new revision, compare).For

K= 1 the problem is trivial. ForK> 1 the stringa^{n}b^{n}will produce an asymptotically largest trie of size at leastn^{2}(you basically get a chain fromb^{n}for every possible prefix ofa^{n}). Were you interested in the optimal answer? From Wiki it seems like Fibonacci word gives you the worst case forK= 2 and I wonder if some generalization of it leads to the answer for other cases.I believe this size is .

Let's show it if

N=K^{m}for some integer (I think it extends in the general case but it's more cumbersome).Sbe a chain such that the substringsS_{i}...S_{i + m - 1}of lengthm(potentially looping around the string) are all distinct. This can be reduced to a Eulerian cycle problem if we take words of lengthm- 1 as vertices, which proves its existence and gives an efficient algorithm for construction. Then we know that the suffixes will be separate in the trie from theirm-th character onwards, so we have at least nodes as desired.K+ ... +K^{m}=O(N) nodes at the levels ≤m, and because of the length of the suffixes we have nodes at most at the levels lower than that. So indeed we're bounded by nodes.Maybe it's possible to refine this approach and eliminate the

O(N) imprecision, but I'm already happy with this.