disclaimer: yet again i do not pretend to claim that this algorithm is useful. bin lifting is probably better in all aspects.

======================================================================================

# intro

if you havent alr, consider reading this since most of my terminology will be from there.

# cube root decomp

before i start, this algorithm is entirely useless. bin lifting has both a shorter impl and it has a better time complexity. but being able to understand this algorithm will help with the generalized algorithm later in this article.

so i first came up with this idea when i doing the CSES Distance Queries with my sqrt lca and it tle'd. so what i was thinking was "why on earth is sqrt lca so slow", then it hit me, sqrt per query is horrible (and i should learn bin lifting, but i would rather brainstorm useless algs like this instead of learning bin lifting tho). but then i thought, what if i used cube root instead of square root? then, i thought, the complexity would go from O(n^(1/2)) to O(n^(2/3)) as i had to loop over n^(2/3) segments of length n^1/3. but thankfully we can do better than that.

so lets do some defining of terms:

CUBE is a constant such that CUBE^3 >= n and CUBE is minimal (a fancy way of saying the cube root of n)

par[u] is the parent directly up from u

super[u] is redefined as the parent CUBE steps up from u

duper[u] is defined as the parent CUBE^2 steps up from u (i have great naming skills btw)

we'll get onto how to compute super and duper fast is a second, but first i'll show you why they help.

so first is the k_up function

```
int k_up(int u, int k){ //returns the k'th parent up from node u
for(; k >= CUBE*CUBE; k -= CUBE*CUBE) u = duper[u];
for(; k >= CUBE; k -= CUBE) u = super[u];
for(;k--;) u = par[u];
return u;
}
```

so what is the complexity of the above code? well, you will iterate the first loop at most CUBE times since there are at most n/(CUBE*CUBE) <= CUBE since k <= n and you go up CUBE*CUBE steps each time, its similar to how sqrt lca works. then you iterate the second loop also CUBE times at most, since after you finish the first loop k will be < CUBE*CUBE, and they are only CUBE CUBE's in CUBE*CUBE. the third loop will also run in CUBE time since k < CUBE after the second loop. **and so this k_up implementation runs in O(CUBE)** (hahah CUBE is a constant so its O(1) hahahh ahahah big O abuse hahaha).

now the important thing is the actual lca finding, so for two nodes on the same level, you can also use the duper/super/par array to find the lca of the two in O(CUBE) (if they are on different levels, just use the k_up function to move the lower one up). so lets say two nodes u and v (on the same level) have the same duper value, their lca will not be further from u (and v) than duper[u] is. since if it was further, you can take duper and it would be closer (this is more or less the same thing as the proof of correctness of the sqrt lca). if they are different, you set u = duper[u] and v = duper[v] and try again. this will happen at most CUBE times (just like the k_up complexity). then you can do the same thing with the super array. since the distance from u to duper[u] is exactly CUBE^2 and we know the lca is at least as close as duper[u], the problem just becomes the same thing with a tree size CUBE^2 (instead of CUBE^3 as it was previously), and then the tree becomes size CUBE with the par array, and you have an lca! this is very similar to the sqrt lca idea, its just that i used 2 arrays (duper and super) instead of 1 (super).

below is an implementation of the code.

```
int lca(int u, int v){
if(dep[u] < dep[v]) swap(u, v); //im just yoinking this
u = k_up(u, dep[u] - dep[v]); //from my sqrt lca code
while(duper[u] != duper[v]) u = duper[u], v = duper[v]; //you can just duper CUBE times before it becomes the root
while(super[u] != super[v]) u = super[u], v = super[v]; //you can just super CUBE times before you actually just wouldve jumped duper again
while(u != v) u = par[u], v = par[v]; //you can jump par CUBE times before you would've just jumped super
return u; //or v since they should be equal
}
```

and this is O(CUBE) per lca query since k_up runs in O(CUBE) and each loop runs in O(CUBE). since each duper[u] is CUBE^2 nodes up and super[u] is CUBE up, you can just loop over super CUBE times and it'll work. below is some code since this'll come up later

```
void precomp(){
//assume dep and par are computed elsewhere, if you dont know how to compute those, you shouldnt be reading abt lca :/
for(int i = 1; i<=n; i++){ //assume n is global and there are n nodes in the tree
super[i] = i;
for(int i = 0; i < CUBE && super[i] != 0; i++) //optimization to stop if super goes out of the tree
super[i] = par[super[i]]; //just looping par up CUBE times, like the sqrt lca but with CUBE
}
for(int i = 1; i<=n; i++){
duper[i] = i;
for(int i = 0; i<CUBE && duper[i]; i++)
duper[i] = super[duper[i]]; //the same thing but jumping super up to save time
}
}
```

**the precomp complexity is O(n*CUBE) as the nested loops are, and queries and O(CUBE).** yet again. you can do precomp in O(n) but im too lazy to cover it. you figured out how to do O(n) for sqrt precomp, its the same thing but with 2 arrays to maintain. this finally ac'ed the cses problem :D. since CUBE < SQRT this algorithm runs better (even if barely and more scuffed).

# nth root lca

well, can we generalize it? what if you used 3 arrays super, duper, and juper (haha great naming) and did O(n^(1/4)) per query and O(n * n^(1/4)) for precomp, or 4 arrays for n^(1/5)? you can and i dont want to implement each one so.... the idea is that you can use R arrays (including the par array) to handle queries in O(R * n^(1/R)) with O(n * R * n^(1/R)) just note if you make R too big, nothing happens since you're jumping up by 1 or 0 each time. actually, if you set R to be log2(n) you end up with bin lifting. so imagine if i actually implemented decomp with n^(1/4) and then i realized i should make it a 2d array where super[u][j] stores the n^(j/4) parent up from u. then if i decided to not limit myself my 4 and i picked a constant R and super[u][j] stores to parent n^(j/R) up from u. you can actually just set the par array to be super[][0] since n^(0/R) is 1 and thats' exactly what the parent array stores. below is an implementation of the functions needed for lca (namely, k_up, lca, and precomp).

i left RT and LOG blank, that's for the reader to play with :), LOG is the exponent and RT is the "root"

super[u][j] is the parent LOG^j times up

base[i] = RT^i

par[u] is the parent

dep[u] is the distance from the root, dep[root] is arbitrary (0 or 1, your choice)

```
const int max_v = ;
const int RT = , LOG = ; //RT^LOG approx max_v
vector<int> adj[max_v];
int super[max_v][LOG * 2], par[max_v], dep[max_v], base[max_v], n, q;
void dfs(int u, int p, int d){ //general util dfs, this O(n)
par[u] = super[u][0] = p;
dep[u] = d;
for(int v : adj[u])
if(v != p)
dfs(v, u, d + 1);
}
void precomp(){ //three nested for loops so its just O(n * RT * LOG)
base[0] = 1;
for(int k = 1; k<=LOG; k++){
base[k] = base[k - 1] * RT;
for(int i = 1, j; i<=n; i++){
for(j = 0, super[i][k] = i; j < RT && super[i][k]; j++){
super[i][k] = super[super[i][k]][k - 1];
}
}
}
}
int k_up(int u, int k){ //O(LOG*RT)
for(int i = LOG; i >= 0; i--){ //similar proof of complexity as the cube root decomp
while(k >= base[i])k -= base[i], u = super[u][i];
}
return u;
}
int LCA(int u, int v){
if(dep[u] < dep[v]) swap(u, v);
u = k_up(u, dep[u] - dep[v]);
if(u == v) return u;
for(int i = LOG; i >= 0; i--){ //this is always O(LOG*RT), just think abt the proof for cube root but then realizing the "root" we're using isnt constant.
while(super[u][i] != super[v][i]) u = super[u][i], v = super[v][i];
}
return (par[u]) ? par[u] : u; //cant return 0
}
```

hopefully, it makes sense, it's just compressing the idea described above with some nested for loops and 2d arrays. **so it is O(n * RT * LOG) for precomp and O(RT * LOG) per query**. just note we were playing with small values of LOG (2 and 3) so we didnt really care abt it, but when you use anything more, it's no longer "constant" if you see what i mean.

# resolution?

thank you for reading this scuffed blog post and please do in fact learn bin lifting, it cleaner than my "nth root decomp" and it runs faster too (unless you set RT to 2 and LOG to log2(n) >:) ). the advantages of choosing this method only lies is you can trade memory for time easily. if you choose RT to be larger, you'll use less memory, and if you set RT to be small, you'll use more. **the memory complexity is always O(n * LOG) since that's the size of the super array.** yet again, please do learn bin lifting, this article is what came out of someone who was too lazy to learn it. and after doing all of this, i always write bin lifting for lca. hahah

thanks for reading and i apologize for any typoes, unclearness, or over scuffedness.