I wanted to write down a very short blog on this technique, since I've encountered it a few times but have not seen it formally discussed anywhere. *Theoretically* this technique shaves off log factor from the complexity, but in practice the technique often only yields a constant factor speed-up, so it's more of a micro-optimization than anything. I think it's interesting enough to warrant discussion anyway, but that's probably the reason why this technique is niche.

Let's assume that all variables here are integers, and that computations are done modulo some large prime. We use $$$I$$$ to refer to the identity matrix and $$$0$$$ to refer to the zero matrix.

**Warmup Problem:** Given a square matrix $$$A$$$ and a positive integer $$$n$$$, evaluate $$$A^n$$$.

Matrices are the classic example of the fast exponentiation algorithm working not just with standard multiplication of integers, but on any object equipped with an associative binary operator.

This is usually one of the first divide-and-conquer algorithms people are taught. One way to approach this is to note that, if $$$n$$$ is even, then

So, we only need to compute $$$A^{\frac{n}{2}}$$$ once, save that value, and then square it. If $$$n$$$ is odd, then we can compute it as $$$A^n = A^{n-1}\times A$$$, since $$$n-1$$$ will be even.

Here is some pseudocode for fast exponentiation.

```
def fast_pow(A,n):
if n==0:
return I
if n&1:
return A*fast_pow(A,n-1)
else:
X = fast_pow(A,n/2)
return X*X
```

This gives a runtime of $$$\mathcal{O}(\lg n)$$$ matrix multiplications needed.

**Problem:** Given a square matrix $$$A$$$ and a positive integer $$$n$$$, evaluate $$$I+A+A^2+\dots+A^{n-1}$$$.

We can't use the geometric formula here because $$$A-I$$$ is not necessarily going to be invertible.

However, we can also use divide-and-conquer to solve this problem. If $$$n$$$ is even, we can split the sum into two halves---the "small" half, with terms whose exponents are $$$< \frac{n}{2}$$$, and the "big" half, with terms whose exponents are $$$\geq \frac{n}{2}$$$. Then, we can factor out $$$A^{\frac{n}{2}}$$$ from the "big" terms, and notice a similarity...

To simplify things, let $$$f(n) = I+A+A^2+\dots+A^{n-1}$$$.

f(n) &= f\left(\frac{n}{2}\right) + A^{\frac{n}{2}}f\left(\frac{n}{2}\right) \\ f(n) &= f\left(\frac{n}{2}\right)(I+A^{\frac{n}{2}}) \end{align}$$$

If $$$n$$$ is odd, then we can compute $$$f(n) = I+A(I+A+\dots+A^{n-2}) = I+A\times f(n-1)$$$, since $$$n-1$$$ will be even.

Here is some pseudocode for the fast summing.

```
def fast_sum(A,n):
if n==0:
return 0
if n&1:
return I+A*fast_sum(A,n-1)
else:
return fast_sum(A,n/2)*(I+fast_pow(A,n/2))
```

We just learned that we can compute $$$A^{\frac{n}{2}}$$$ in $$$\mathcal{O}(\lg n)$$$, therefore with Master Theorem (or similar analysis) we conclude that the above algorithm takes at most $$$\mathcal{O}(\lg^2 n)$$$ matrix multiplications.

Can we do better than $$$\mathcal{O}(\lg^2 n)$$$?

**DP on Function Calls**

Notice that `fast_sum`

is $$$\mathcal{O}(\lg^2 n)$$$ because there are $$$\mathcal{O}(\lg n)$$$ function calls, and within each function call we need to perform an $$$\mathcal{O}(\lg n)$$$ operation (the fast exponentiation). Is it possible for us to perhaps get the value of $$$A^{\frac{n}{2}}$$$ in just $$$\mathcal{O}(1)$$$?

Try adding `print(n)`

to the start of each function and observe which values of $$$n$$$ each function visits in each of its recursive calls. Do you notice something? If we start with the same initial $$$n$$$, then `fast_pow(A,n)`

and `fast_sum(A,n)`

will visit the exact same states in the exact same order! The reason why is simple---if you look at the recurrence relations derived, the transitions between states are exactly the same in both functions ($$$n \to n-1$$$ if $$$n$$$ is odd, and $$$n \to \frac{n}{2}$$$ if $$$n$$$ is even).

What does this mean? Well, in `fast_sum(A,n)`

, we need $$$A^{\frac{n}{2}}$$$ for various values of $$$n$$$. However, in `fast_pow(A,n)`

, we *also* need $$$A^{\frac{n}{2}}$$$ for those exact same values of $$$n$$$, and in fact we *already* compute all the values that we need when we call `fast_pow(A,n)`

once.

So, instead of discarding $$$X = A^{\frac{n}{2}}$$$, we store those values into a lookup table. Before calling `fast_sum(A,n)`

, we first call `fast_pow(A,n)`

in order to populate the lookup table. Then, `fast_sum(A,n)`

now magically computes each $$$A^{\frac{n}{2}}$$$ that it needs in just $$$\mathcal{O}(1)$$$ since it is just an array lookup.

Since both functions visit the same states in the same order, it is sufficient for us to store $$$X$$$ into a simple array `state`

, where `state[s]`

equals $$$A^{\frac{n}{2}}$$$ in the `s`

th recursive call.

Here is some pseudo-code for this technique.

```
state = [None for i in range(60)]
def fast_pow(A,n,s=0):
if n==0:
return I
if n&1:
return A*fast_pow(A,n-1,s+1)
else:
X = fast_pow(A,n/2,s+1)
state[s] = X
return X*X
def fast_sum(A,n,s=0):
if n==0:
return 0
if n&1:
return I+A*fast_sum(A,n-1,s+1)
else:
return fast_sum(A,n/2,s+1)*(I+state[s])
fast_pow(A,n) # make sure to call fast_pow first to populate the table
print(fast_sum(A,n))
```

Here, `fast_sum(A,n)`

is evaluated using only $$$\mathcal{O}(\lg n)$$$ matrix multiplications.

I am going to call the following technique "DP on Function Calls", since I can't seem to find a name for it anywhere else online, though I am sure that I am not the first one to have invented it. I also toyed with the name "Functional Cascading", to mirror the technique called "Fractional Cascading", which works on similar principles.

**Sample Problems that use DP on Function Calls**

I originally managed to come up with DP on Function calls when making the model solution for this problem, which I wrote for the 2020 NOI.PH Eliminations Round. The problem can still be solved even without shaving off the extra log factor, but the time limit is tighter and you have to be much more careful with your constant factors.

For this problem, my solution without DP on Function Calls takes 0.54 ms, whereas *with* DP on Function Calls it takes 0.27 ms. That seems to be a `x2`

constant factor speedup, which I think makes it interesting despite the technique being strictly unnecessary for this problem :P

If you know any, please link more examples of problems that use DP on Function Calls in the comments! I am quite fond of this cute technique and I'd like to see more problems for which it is applicable.

P.S. The given "pseudocode" is valid Python, but you would want to do `n>>1`

or `n//2`

instead of `n/2`

, so that we perform integer division. When I type `n//2`

in the block code, everything after the `//`

gets highlighted red since it's being interpreted as a comment. If someone can help me fix that, it would be appreciated as well!