Is there a better than quadratic-time solution for "string dictionary with wildcards" problem?

Revision en1, by mouse_wireless, 2021-07-05 01:37:39

Hello codeforces,

Long time no see.

I'm having a look at problem #211 on leetcode. It's a classic problem: you add strings to a collection, and you have queries which ask if a string is part of the collection, except the query strings can contain wildcards, which can match any character.

The recommended solution is a trie-based one, in which if you encounter a wildcard, you try to continue the query from each of the current node's children (similar to a normal graph traversal).

The problem with this solution, is that it has a worst case complexity of $$$O(Q * N * L)$$$, where Q is the number of search queries, N is the number of added strings, and L is the maximum length of the string. This is achieved when there are N "different enough" strings (randomly generated will do) of max length, and Q queries of L wildcards (to avoid premature termination of the traversal, you can also force the final character to be different). In this case, on each query, you have to traverse the entire trie, which has $$$O(N * L)$$$ nodes.

I've run a lot of solutions I found on leetcode against this test case (code below).

int main() {
  WordDictionary D;
  for (int i = 0; i < 25000; ++i) {
    string word;
    for (int j = 0; j < 499; ++j)
      word.push_back(rand() % 26 + 'a');
    word.push_back('z');
    D.addWord(word);
  }
  int cnt = 0;
  string q(499, '.');
  q.push_back('a');
  for (int i = 0; i < 25000; ++i)
    cnt += D.search(q);
  cout << cnt << endl;
  return 0;
}

And none of them would finish in under 1 minute (there are some solutions which use a different approach which work on this case but "fail" on others).

So I decided to take out "the big guns" and ask codeforces. Is there a better than worst-case $$$O(Q * N * L)$$$ solution?

Tags complexity, trie, #strings

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en1 English mouse_wireless 2021-07-05 01:37:39 1976 Initial revision (published)