from_dyu_import_geek's blog

By from_dyu_import_geek, history, 4 years ago, In English

Google Kickstart 2020 had a problem called bundling. I need a little help understanding some code for this problem. Below is someone else's code to solve this problem. I have understood that he is using a 2D array as a Trie (prefix tree). Can anyone please explain, step by step what his dfs() function does exactly? And what exactly the "cnt[]" array keeps track of. Thank you.

#include <bits/stdc++.h>
using namespace std;

#define ll long long
#define ar array

int n, k, c[2000001][26], m, cnt[2000001];
ll ans;

void dfs(int u=0, int d=0) {
	for(int v=0; v<26; ++v)
		if(c[u][v])
			dfs(c[u][v], d+1), cnt[u]+=cnt[c[u][v]];
	while(cnt[u]>=k) {
		cnt[u]-=k;
		ans+=d;
	}
}

void solve() {
	cin >> n >> k;
	m=1;
	for(int i=0; i<n; ++i) {
		string s;
		cin >> s;
		int u=0;
		for(char d : s) {
			if(!c[u][d-'A'])
				c[u][d-'A']=m++;
			u=c[u][d-'A'];
		}
		++cnt[u];
	}
	ans=0;
	dfs();
	cout << ans << "\n";
	memset(c, 0, m*sizeof(c[0]));
	memset(cnt, 0, m*4);
}

int main() {
	ios::sync_with_stdio(0);
	cin.tie(0);

	int t, i=1;
	cin >> t;
	while(t--) {
		cout << "Case #" << i << ": ";
		solve();
		++i;
	}
}
| Write comment?
»
4 years ago, # |
  Vote: I like it -8 Vote: I do not like it

Maybe you should ask the author of the code: tmwilliamlin168

  • »
    »
    4 years ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    I did, but couldn't get an answer :)

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +1 Vote: I do not like it

      The trie is pretty simple actually, for each node $$$u$$$, $$$c[u][ch]$$$ stores the node after following the character $$$ch$$$ from $$$u$$$. When inserting ( just after taking input ),

      if(!c[u][d-'A'])
          c[u][d-'A']=m++;
      

      This takes care of making new node ( then incrementing total number of nodes counter $$$m$$$ ), because initially array is all $$$0$$$.

      Notice, each node $$$u$$$ actually represents a common prefix of some group of strings. For each node $$$u$$$, $$$cnt[u]$$$ stores number of strings with prefix till $$$u$$$ ( i.e. total nodes ending below $$$u$$$ somewhere ). At first, $$$cnt$$$ is only updated at the leaves of the trie. While coming up from dfs, all children's $$$cnt$$$ is added to parent. And, at any point if $$$cnt$$$ is more than (or equal to) $$$k$$$, then he subtracts $$$k$$$ and adds $$$1$$$ to $$$ans$$$ ( this is greedy ).

      • »
        »
        »
        »
        4 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        This really helped conceptually! But you said "while coming up from dfs"? Isn't his code bringing him downwards through the trie? What have I misunderstood about the flow of his code?

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          When you use recursion for dfs ( for any type of tree / trie ), and implement it as follows:

          void dfs( int node )
          {
               for( all neighbors x of node )
               {
                     // before
                     dfs(x)
                     // after
               }
          }
          

          When you write some code in place of "before", it is executed at each node before going down to the children ( example: depth[x] = depth[node] + 1 ), and when you write code in place of "after", you do stuff after completing the whole subtree ( in this case, you count all the stuff below by size[node] += size[x] [Note: here node = u, and x = c[u][ch] ] )

          • »
            »
            »
            »
            »
            »
            4 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            So you're saying that this line of code:

            cnt[u]+=cnt[c[u][v]];

            (that comes after dfs() is called) isn't executed everytime the 'if statement' above is true, because dfs() is being called first? So when will it be executed?

            Also, after the 'while loop' terminates, and v = 25, my understanding is that the function should terminate; but while observing the code, I found that dfs() runs again. Why is this happening? Thanks!

            • »
              »
              »
              »
              »
              »
              »
              4 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              You need to understand two things first:

              • Recursion
              • DFS

              Let's say this is the tree ( you can think about tries in a similiar way where each edge will denote following a character of string ).

              We will use the following code for dfs.

              void dfs( int node )
              {
                   visited[node] = 1;
                   for all neighbors next of node:
                   {
                         if( !visited[next] )
                         {
                               dfs(next);
                               cnt[node] += cnt[next];
                         }
                   }
              }
              

              (Assume $$$cnt[3]$$$ = $$$cnt[5]$$$ = $$$cnt[6]$$$ = $$$cnt[7]$$$ = 1, since they are leafs).

              Now, when you call $$$dfs(1)$$$ ( $$$0$$$ level recursion depth ), then $$$dfs(2)$$$ is called ( $$$1$$$ level ) then $$$dfs(5)$$$ ( $$$2$$$ level ) is called and there are no neighbours of $$$5$$$ ( that are unvisited, because $$$2$$$ was marked visited when "going down" from $$$2$$$ to $$$5$$$ ). So, we return to the point in $$$dfs(2)$$$ where we went down and the next line is cnt[2] += cnt[5] which makes $$$cnt[2] = 1$$$. Then in $$$dfs(2)$$$, $$$dfs(6)$$$ is called which returns and cnt[2] += cnt[6] makes $$$cnt[2] = 2$$$.

              So, answering your questions, the cnt[u] += cnt[c[u][v]] is executed every time condition in $$$if$$$ is true. Because $$$dfs$$$ is being called first, the whole subtree of $$$c[u][v]$$$ is first visited and processed and then $$$cnt[c[u][v]]$$$ has the total number of leafs under $$$c[u][v]$$$, so you can add that to $$$cnt[u]$$$.