maximaxi's blog

By maximaxi, history, 2 years ago, In English,

I have seen some people's code to represent a trie in a 2D array. For example, this is my old code to solve Autocomplete. However, I forgot how it works!

Can someone explain how to code a 2D array that represents a trie? I know that one dimension of the array represents the alphabet, and that the values inside the trie array are the node numbers, where the first letter you add is node number 1. Still, I'm confused where to insert values into the trie array, and how strings with the same prefixes would be distinguished in the tire.

Thanks.

#include <bits/stdc++.h>
#define rd(s) freopen(s, "r", stdin);
#define wt(s) freopen(s, "w", stdout);
using namespace std;

const int MAX=1e6+2, ALF=26+2;
int t, trie[MAX][ALF];

int main()
{
    //rd("test.txt");

    scanf("%d", &t);
    for (int i=1; i<=t; ++i) {
        int ans=0, n, numm1=1;
        for (int _=0; _<MAX; ++_) for (int __=0; __<ALF; ++__) trie[_][__] = 0;
        scanf("%d", &n);
        for (int j=0; j<n; ++j) {
            int node=0, idx=0;
            string word; cin>>word;
            while (trie[node][word[idx]-'a'] && idx < word.length()) {
                node=trie[node][word[idx]-'a'];
                ++ans; ++idx;
            }
            if (idx != word.length()) ++ans;
            //printf("ans is %d\n", ans);
            for (; idx<word.length(); ++idx) {
                trie[node][word[idx]-'a'] = numm1;
                //printf("new path from %d to %d via %c\n", node, numm1, word[idx]);
                node=numm1;
                ++numm1;
            }
        }
        printf("Case #%d: %d\n", i, ans);
    }
}
 
 
 
 
  • Vote: I like it
  • -9
  • Vote: I do not like it

»
2 years ago, # |
  Vote: I like it +1 Vote: I do not like it

Each row in your array represents a node in the trie structure. Node 0 is the root.

When you want to insert a new string, begin at node 0. Let c be the first character of your string, and see if trie[0][c] is valid or not. If it is not valid "allocate" a new node by incrementing the size of your trie and set trie[0][c] to the newly allocated node.

Here's a simple example. I'm using -1 to signify that there is no node there.

Initially:

0: -1 -1 -1 ...
1: -1 -1 -1 ...
2: -1 -1 -1 ...
3: -1 -1 -1 ...

Size of the trie is 1 (because of the root node). Now let's insert string "ABC". Then since trie[0]['A'] is -1, we need to allocate a new node and set trie[0]['A'] to it. Then our structure is

0:  1 -1 -1 ...
1: -1 -1 -1 ...
2: -1 -1 -1 ...
3: -1 -1 -1 ...

Now we are currently at node 1 and need to insert "BC". Following this procedure twice, our trie looks like

0:  1 -1 -1 ...
1: -1  2 -1 ...
2: -1 -1  3 ...
3: -1 -1 -1 ...
  • »
    »
    2 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    By the way, check out my new standup

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Can we find the highest continue frequency of any character in any string of dictionary using Trie?

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

i just have a small doubt is it true that if we implement a trie in this way using predefined sizes for arrays then it is more faster than pointer oriented code ... .

»
2 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Lets not make it complicated. First, think of how you can represent a tree using an array. If you can do that you can do for trie as well since a trie is also a tree.

Now suppose tree is a binary tree. Normally you represent a tree as Node {int val; Node *l, *r} Where l, r are pointers to the children. Now suppose you don't do dynamic memory allocation and replace l, r with indices of Nodes in array. First you can check whether l points to a children. If not, then you will use an unused Node of our predefined array. Similarly for r.

For a trie, suppose a string has length L. Then in the worst case, for each suffix we need to create a linked list. So total size would be L+L-1+....+1=O(L^2). But if alphabet is just lowercase(a-z) then, some suffixes will have same starting characters and memory will be less than L^2.