Bliss8's blog

By Bliss8, history, 2 months ago, In English,

question example:
given a string "aaaabbbasd" and dictionary, for example {a, aa, aaa, b, bb, s, d}
1) find one possible way to split the string into words from dictionary
2) find all possible ways to split the string into words from dictionary

Searching for the best approach on web I found different ones: dp, backtracking + memorization, dfs + memorization.
I lost in all of this huge variety.
Does somebody know what approach is the best here?

 
 
 
 
  • Vote: I like it  
  • -5
  • Vote: I do not like it  

»
2 months ago, # |
Rev. 3   Vote: I like it +10 Vote: I do not like it

You can run a DP + trie solution. Let the state be dp[i][u], representing the number of ways to split the string starting from position i and being at node u in the trie. Let s[i] be the current letter. The transition is either of (or both, if conditions hold):

  1. If u has a transition labeled s[i], then take it and go to dp[i+1][ adj[u][s[i] ].
  2. If u is a terminal node, then split there and start counting again, going to dp[i+1][0].

This has a complexity of around O(|s|*|dict|) where |dict| is the size of the dictionary trie (about sum of lengths of all dictionary words).

Another related idea is preprocessing the "match positions" with aho corasick, and keeping a single state dp[i], number of ways to split the string starting from position i. Here the transition would be:

dp[i] = sum (over all j, words that match s[i,j)) of dp[j].

This runs in O(|dict| + |s|) for the preprocessing, plus O(|num_matches| + |s|) for the dp, where num_matches is the total amount of matches from words of the dictionary into s. While this can be as bad as the first dp in pathological cases like aaaaaaaa and such, it should be faster on average.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    thank you
    could you also tell something regarding the case when dictionary is limited to be hash table? can we achieve better than O(n^3) complexity in this case?

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I don't know what n is. You can do the preprocessing in the second solution I proposed by using a rolling hash, but you'll need several hashes as you'll run into problems because of the birthday paradox. It doesn't change the complexity.

»
2 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it
  1. Spy Syndrome 2

Complexity >> O(length of string * maximum length of word in dictionary)