Given a list of strings, build a DAG in which each node is a string and there is an edge x->y iff x is proper substring of y

Правка en5, от pabloskimg, 2018-10-21 17:55:36

There are at most N = 10^4 strings, each string is at most MAXLEN = 1000 characters long, but the length of the concatenation of all strings is at most 10^6. What would be the more efficient way to build a DAG as described in the title? The naive way would be comparing each pair of strings (X,Y), which leads to O(N^2) comparisons, and then for each pair to check whether X is substring of Y in O(MAXLEN^2). The naive solution could be improved by first sorting strings by length so that each string X can only be substring of strings to the right, and also we could use Rolling Hashing to reduce the complexity of substring search to O(MAXLEN). Is it possible to do even better? I've got the feeling that Suffix Array could be of help, but I'm not sure of exactly how. The motivating problem is this one

Теги #strings, substring search


  Rev. Язык Кто Когда Δ Комментарий
en5 Английский pabloskimg 2018-10-21 17:55:36 1
en4 Английский pabloskimg 2018-10-21 07:50:28 63
en3 Английский pabloskimg 2018-10-21 07:47:51 12 Tiny change: 'substring verification to O(MAXL' -> 'substring search to O(MAXL'
en2 Английский pabloskimg 2018-10-21 07:45:53 7
en1 Английский pabloskimg 2018-10-21 07:45:01 1009 Initial revision (published)