ed1d1a8d's blog

By ed1d1a8d, 10 years ago, In English

Here's the problem: SUB_PROB

I'm having trouble getting my solution to run in time, even though my algorithm should pass. If you didn't already read the problem statement, it asks you to check whether or not words exist in a piece of text. I'm currently using Aho-Corasick to solve the problem. Basically, I build up the search trie first in O(N*S), and then search through the text once in O(M). This should add up to a O(M + N*S) algorithm, which by my calculations — should AC.

Here is my code: OLD http://ideone.com/ndQcN4 OLD

EDIT — HERE IS UPDATED CODE: http://ideone.com/3P7XIj

Note: I had to split up the query strings into 3 batches since my trie would exceed the memory limit otherwise.

Is my algorithm flawed, or is my implementation not optimal. And if it is the latter case, what can I do to speed up the algorithm. I have already removed all recursive methods in exchange for iterative ones, so I don't know how I can speed it up further. Thanks.

  • Vote: I like it
  • +1
  • Vote: I do not like it

»
10 years ago, # |
Rev. 4   Vote: I like it +3 Vote: I do not like it
  1. what is vector repeats[MAX_N]; ? use bitmasks to remember set of words for terminating vertices. it takes only 12.5 MB memory

struct node { int val; int terminal; int parent; map<char, int> childrens; int fail; }; or struct node { int val; int terminal; int parent; int fail; }; map<pair<int, char>, int> childrens; ... for (map<pair<int, char>, int>::iterator ch = childrens.lower_bound(pair<int, char>(x, 0)); ch != childrens.end() && ch->first.first == x; ++ch) // for_each_chilren_of(x)
  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I went about and implemented your suggestions, but it still gets TLE. Although now, it does take up less memory (only 79MB total). As for your first suggestion, I don't really need to save that space so I stuck with using vector<int> repeats[MAX_N]. In addition, I also reverted scanf and printf to cin and cout since it made no difference when testing it with large cases on my own computer. Is there a trick I'm missing in Aho-Corasick, or should I approach the problem differently, maybe using Suffix Tree?

    Here is my updated code: http://ideone.com/3P7XIj

    Thanks for your help.

    • »
      »
      »
      10 years ago, # ^ |
      Rev. 5   Vote: I like it 0 Vote: I do not like it
      1. I don't undesrtand what is array "repeats". I think it can be very large and slow. use bitmasks.

      2. if(aho[idx].children.count(val) > 0) return aho[idx].children[val]; the is two searches, but you need only one.

      map<int, int>::iterator it = aho[idx].chilrdern.count(val);
      if (it != aho[idx].chilrdern.end()) return it->second;
      
      

      3 scanf & printf is faster than cin & cout

    • »
      »
      »
      10 years ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      also you can store trie using hash-table, it is fast enough. code
      UPD: thre is a bug at line 81. has to be int hp = hash(parent, val);

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

can someone tell why i'm getting wa, used string hashing link to code: https://ideone.com/oRX1SS

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Firstly the strings will also contain numeric characters and secondly you have to consider 'A' and 'a' as different character.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I have tried to solve the problem using string hashing. But I got TLE. I have solved the problem with O(n*m) complexity. How I can optimize it? Here is my code -- https://ideone.com/zrInz2