So, about three weeks ago a fast ($$$\mathcal{O}(n\log^2(n))$$$) algorithm for composition and compositional inverse of formal power series has been published (and discovered some time earlier) by noshi91, here is the original blog (in Japanese). Previous best algorithm was $$$\mathcal{O}((n\log n)^{3/2})$$$. I hope someone more competent than myself will make an in-deapth blog about this later, I'll link it here as soon as I see it. For now I'll give a "quick" retelling of the algorithm.

UPD: here is a blog that focuses on implementing polynomial composition. It also uses a somewhat different approach, that doesn't explicitly use Tellegen's principle.

UPD 2: Article by noshi91 and Elegia

UPD 3: VERY in-deapth blog by maspy, with great explanation of Tellegen's (transposition) principle as well, definitely recommend reading this one.

# Bostan-Mori algorithm

(Summary of section 2.1 of this article).

We want to compute $$$[x^N]\frac{P(x)}{Q(x)}$$$. To do that we will use an observation that for any polynomial $$$Q(x)$$$ the polynomial $$$Q(x)Q(-x)$$$ is even, so it is equal so $$$V(x^2)$$$ for some $$$V$$$. Also let $$$U(x)=P(x)Q(-x), U(x)=U_e(x^2)+xU_o(x^2)$$$.

So if $$$N$$$ is even $$$[x^N]\dfrac{P(x)}{Q(x)} = [x^{\frac{N}{2}}]\dfrac{U_e(x)}{V(x)}$$$ and if $$$N$$$ is odd $$$[x^N]\dfrac{P(x)}{Q(x)}=[x^{\frac{N-1}{2}}]\dfrac{U_o(x)}{V(x)}$$$. Note that $$$deg(V)=deg(Q)$$$ and $$$deg(U_i) \approx \dfrac{deg(P)+deg(Q)}{2}$$$, so if initially $$$deg(P) \leqslant n$$$ and $$$deg(Q) \leqslant n$$$, those inequalities still hold after we make $$$P=U_i, Q=V, N = \left[\frac{N}{2} \right]$$$. So each step can be made in $$$2$$$ multiplications of polynomials of degree $$$\leqslant n$$$ and there are $$$\log(N)$$$ steps untill we have $$$N=0$$$ and the task is trivial, so the algorithm works in $$$\mathcal{O}(n\log n \log N)$$$.

## Application to bivariate polynomials

It turns out that this algorithm works great with some bivariate polynomials as well.

Now we want to find $$$[x^k]\frac{P(x, y)}{Q(x, y)}$$$ where $$$deg_y(P), deg_y(Q) \leqslant 1$$$. Note that we don't need any coeffitients with $$$[x^{>k}]$$$ in $$$P$$$ and $$$Q$$$, so we can assume $$$deg_x(P), deg_x(Q) \leqslant k$$$. Now if we make one step of Bostan-Mori algorithm we will double limits on degrees of $$$P$$$ and $$$Q$$$ in respect to $$$y$$$ and halve $$$k$$$, so after we take $$$P$$$ and $$$Q$$$ modulo $$$x^{k+1}$$$ again, we'll halve their degrees in respect to $$$x$$$. So the product of degrees in respect to $$$x$$$ and $$$y$$$ stays the same and we'll be able to find $$$U$$$ and $$$V$$$ on each step in $$$\mathcal{O}(k\log k)$$$ and solve the problem in $$$\mathcal{O}(k\log^2(k))$$$ total time.

# Compositional inverse

From the "Coefficient of fixed $$$x^k$$$ in $$$f(x)^i$$$ " example of Lagrange inversion theorem from this blog by zscoder we know a connection between $$$[x^k]\dfrac{1}{1-yf(x)}$$$ and $$$\left(\dfrac{x}{f^{(-1)}(x)}\right)^k \mod x^k$$$, where $$$f^{(-1)}(f(x))=x$$$. Specifically,

so we can find either one from the other in linear time. We can now find the first value with Bostan-Mori algorithm in $$$\mathcal{O}(k\log^2(k))$$$, and getting $$$f^{(-1)}$$$ from the second value is easy (take $$$ln$$$, multiply by $$$-\frac{1}{k}$$$, take $$$exp$$$, multiply by $$$x$$$), we can now find $$$f^{(-1)} \mod x^k$$$ in $$$\mathcal{O}(k\log^2(k))$$$ time.

# Composition

Finding $$$H(F(x)) \mod x^n$$$ is a linear transformation of vector of coefficients of $$$H$$$. If we transpose this transformation we'll get some linear transformation of $$$n$$$-dimansional vector $$$\overline{c}$$$ into a $$$deg(H)+1$$$-dimansional one. If $$$C(x)$$$ is the polynomial whose coeffitients correspond to coordinates of $$$\overline{c}$$$, then the transposed transformation is finding $$$[x^{n-1}]\dfrac{rev(C(x))}{1-yF(x)} \mod y^{deg(H)+1}$$$, where $$$rev(C(x))$$$ is $$$x^{n-1}C(\frac{1}{x})$$$ or just reversed $$$C$$$. This can be found in $$$\mathcal{O}(n\log^2(n))$$$ with Bostan-Mori algorithm, so by Tellegen's principle the original problem of polynomial composition can be solved in $$$\mathcal{O}(n\log^2(n))$$$ with the same constant factor as well.

Auto comment: topic has been updated by Kapt (previous revision, new revision, compare).Auto comment: topic has been updated by Kapt (previous revision, new revision, compare).Someone wrote a blog about implementing it here. (By the way, I recommend reading other blogs by the same person too, they're pretty good).

Thanks! I added a link to it, it's approach seems easier to implement too. Why did it get so little attention though? I never thought such algorithm was even possible

I'm guessing it's because the blog looks mathematical and it is not from some high-rated account. I hope someone adds it to the Catalog, though.

Because it is math

cuz who actually reads this stuff and is like "hmm yeah that all checks out" (no offense ofc I'm sure it is great work)

Maybe it's because the blog was written in Japanese (sorry about that). The paper about this algorithm has been published, and you can read it here: https://arxiv.org/abs/2404.05177 .

I want to convince myself to write a high-performance implementation, but I can't think of any applications outside of competitive programming. I understand that polynomial composition is useful for combinatorics and number theory, but do any end-users rely on it?

This can be used for Newton's method with arbitrary equations (not given in closed form), which might be useful for some approximation tasks (finding Taylor formula to n-th term)

How can it be used for competitive programming?

Yes, that’s the point of libraries like FLINT NTL and others. Contributions are very welcomed.

Sorry that we have been busy writing the paper in the past few weeks. Now let me make some more comments on this algorithm.

The previous Brent--Kung algorithm runs in time $$$O((n\log n)^{3/2})$$$, actually, this is not the best-known algorithm before our algorithm.

not very hardwhen the number of variables is moderately large (i.e., the degree is moderately small).The first one looks quite great, but the hidden constant makes it impractical in the current setting. Due to the impracticality of fast matrix multiplication, the second one has $$$\tilde O(n^{5/3})$$$ time when replacing the matrix multiplication with brute force.

So Brent--Kung is still the best algorithm in practice before the new algorithm, and it is implemented in the FLINT library (see the documentation).

According to the practical experiments in library checker, it's quite clear that it's time to introduce the new algorithm to the maintainer of number theoretic computation libraries. :)

An alternative way to interpret the algorithm is to consider the following process:

The obvious ways to generalize the new algorithm are the following, still within the time complexity $$$O(n\log^2 n)$$$.:

Indeed, the Kedlaya--Umans algorithm and the $$$O(n^\kappa)$$$ time algorithm solve in arbitrary modulo $$$h$$$, while our algorithm only works for power series. A possible next topic is to generalize the new algorithm to arbitrary modulo $$$h$$$.

So what about its application in CP?The first thing is that there are some problems concerning compositions and compositional inverse. For those problems, usually, we need to give an independent analysis of what differential equation it satisfies, then use such equation to produce a relaxed convolution (a.k.a. CDQ divide-conquer). Surely the relaxed convolution is theoretically faster, but I think the advantage will not be significant when the equation is complex. So now we have a quasi-optimal solution for all these kinds of problems.

There are also several problems that I didn't know how to do efficiently, but now it is solved automatically, for example: Compute a whole column of Eulerian number $$$\langle {n\atop m} \rangle$$$ ! In the past, we can only compute a row in $$$O(n\log n)$$$ time.

It might even be the time to check all enumerative combinatorics problems (which involve generating functions) that are considered hard in the past and see if we can solve them in a better way.

I am curious to know how to compute a column of Eulerian numbers, yet find no clue in https://arxiv.org/abs/2404.05177. Any reference? Thanks!!

The generating function of Eulerian number is $$$(1-t) e^{(1-t)x} / (1 - te^{(1-t)x})$$$, then by fairly standard manipulation we can obtain the EGF of the $$$k$$$-th column:

for the first part, we have

which can be done by power series composition. The second part can be treated similarly.

My friend had already implemented this and created the template problem Column of Eulerian Number on LibreOJ.