Google Kickstart 2020 had a problem called bundling. I need a little help understanding some code for this problem. Below is someone else's code to solve this problem. I have understood that he is using a 2D array as a Trie (prefix tree). Can anyone please explain, step by step what his dfs() function does exactly? And what exactly the "cnt[]" array keeps track of. Thank you.
#include <bits/stdc++.h>
using namespace std;
#define ll long long
#define ar array
int n, k, c[2000001][26], m, cnt[2000001];
ll ans;
void dfs(int u=0, int d=0) {
for(int v=0; v<26; ++v)
if(c[u][v])
dfs(c[u][v], d+1), cnt[u]+=cnt[c[u][v]];
while(cnt[u]>=k) {
cnt[u]-=k;
ans+=d;
}
}
void solve() {
cin >> n >> k;
m=1;
for(int i=0; i<n; ++i) {
string s;
cin >> s;
int u=0;
for(char d : s) {
if(!c[u][d-'A'])
c[u][d-'A']=m++;
u=c[u][d-'A'];
}
++cnt[u];
}
ans=0;
dfs();
cout << ans << "\n";
memset(c, 0, m*sizeof(c[0]));
memset(cnt, 0, m*4);
}
int main() {
ios::sync_with_stdio(0);
cin.tie(0);
int t, i=1;
cin >> t;
while(t--) {
cout << "Case #" << i << ": ";
solve();
++i;
}
}
Maybe you should ask the author of the code: tmwilliamlin168
I did, but couldn't get an answer :)
The trie is pretty simple actually, for each node $$$u$$$, $$$c[u][ch]$$$ stores the node after following the character $$$ch$$$ from $$$u$$$. When inserting ( just after taking input ),
This takes care of making new node ( then incrementing total number of nodes counter $$$m$$$ ), because initially array is all $$$0$$$.
Notice, each node $$$u$$$ actually represents a common prefix of some group of strings. For each node $$$u$$$, $$$cnt[u]$$$ stores number of strings with prefix till $$$u$$$ ( i.e. total nodes ending below $$$u$$$ somewhere ). At first, $$$cnt$$$ is only updated at the leaves of the trie. While coming up from dfs, all children's $$$cnt$$$ is added to parent. And, at any point if $$$cnt$$$ is more than (or equal to) $$$k$$$, then he subtracts $$$k$$$ and adds $$$1$$$ to $$$ans$$$ ( this is greedy ).
This really helped conceptually! But you said "while coming up from dfs"? Isn't his code bringing him downwards through the trie? What have I misunderstood about the flow of his code?
When you use recursion for dfs ( for any type of tree / trie ), and implement it as follows:
When you write some code in place of "before", it is executed at each node before going down to the children ( example:
depth[x] = depth[node] + 1
), and when you write code in place of "after", you do stuff after completing the whole subtree ( in this case, you count all the stuff below bysize[node] += size[x]
[Note: herenode = u, and x = c[u][ch]
] )So you're saying that this line of code:
cnt[u]+=cnt[c[u][v]];
(that comes after dfs() is called) isn't executed everytime the 'if statement' above is true, because dfs() is being called first? So when will it be executed?
Also, after the 'while loop' terminates, and v = 25, my understanding is that the function should terminate; but while observing the code, I found that dfs() runs again. Why is this happening? Thanks!
You need to understand two things first:
Let's say this is the tree ( you can think about tries in a similiar way where each edge will denote following a character of string ).
We will use the following code for dfs.
(Assume $$$cnt[3]$$$ = $$$cnt[5]$$$ = $$$cnt[6]$$$ = $$$cnt[7]$$$ = 1, since they are leafs).
Now, when you call $$$dfs(1)$$$ ( $$$0$$$ level recursion depth ), then $$$dfs(2)$$$ is called ( $$$1$$$ level ) then $$$dfs(5)$$$ ( $$$2$$$ level ) is called and there are no neighbours of $$$5$$$ ( that are unvisited, because $$$2$$$ was marked visited when "going down" from $$$2$$$ to $$$5$$$ ). So, we return to the point in $$$dfs(2)$$$ where we went down and the next line is
cnt[2] += cnt[5]
which makes $$$cnt[2] = 1$$$. Then in $$$dfs(2)$$$, $$$dfs(6)$$$ is called which returns andcnt[2] += cnt[6]
makes $$$cnt[2] = 2$$$.So, answering your questions, the
cnt[u] += cnt[c[u][v]]
is executed every time condition in $$$if$$$ is true. Because $$$dfs$$$ is being called first, the whole subtree of $$$c[u][v]$$$ is first visited and processed and then $$$cnt[c[u][v]]$$$ has the total number of leafs under $$$c[u][v]$$$, so you can add that to $$$cnt[u]$$$.