### hly1204's blog

By hly1204, history, 21 month(s) ago, # A simple way to understand the transposed algo of multi-eval

A simple way to understand the transposed algorithm of multi-point evaluation of polynomials in Tellegen’s Principle into Practice.

I learned this from Bernstein's paper Scaled Remainder Trees. If there are any mistakes, please tell me, thanks! (Sorry for my poor English)

Let $\mathbb{R}((x))$ denote the ring of formal Laurent series, $\mathbb{R}\lbrack x\rbrack$ denote the ring of polynomial. $f(x)\bmod 1:=f_{-1}x^{-1}+f_{-2}x^{-2}+\cdots$.

Bernstein showed that division in $\mathbb{R}\lbrack x\rbrack$ is division in $\mathbb{R}((x^{-1}))$. Here is a concrete example. Let $A(x):=1+x+4x^2+5x^3+x^4+4x^5\in\mathbb{R}\lbrack x\rbrack$ and $B(x):=1+9x+8x^2+x^3\in\mathbb{R}\lbrack x\rbrack$, we want to compute $Q(x),R(x)\in\mathbb{R}\lbrack x\rbrack$ such that $A(x)=B(x)Q(x)+R(x)$ and $\deg R(x)\lt \deg B(x)$ (with $\deg 0=-\infty$). Let's do this in $\mathbb{R}((x^{-1}))$. First, we should find the multiplicative inverse of $B(x)$ which is

$B^{-1}(x)=x^{-3}-8x^{-4}+55x^{-5}-369x^{-6}+2465x^{-7}-16454x^{-8}+109816x^{-9}-\cdots$

and compute

$A(x)B^{-1}(x)=4x^2-31x+217-1457x^{-1}+9735x^{-2}-64983x^{-3}+433706x^{-4}-\cdots$

We could find that $Q(x)$ is exactly $A(x)B^{-1}(x)-(A(x)B^{-1}(x)\bmod 1)$, and the leading coefficient of $R(x)$ equals $[x^{-1}]A(x)B^{-1}(x)$. That's not a coincidence. Consider the simplest case

$(x-v)^{-1}=x^{-1}+vx^{-2}+v^2x^{-3}+\cdots=\sum_{n\geq 0}v^nx^{-(n+1)}$

So we have $f(v)=[x^{-1}]f/(x-v)$ cause

\begin{aligned} f(x)/(x-v)&=f_0x^{-1}+f_1+f_2x+\cdots +f_nx^{n-1}\\ &+f_0vx^{-2}+f_1vx^{-1}+f_2v+\cdots +f_nvx^{n-2}\\ &+\cdots \end{aligned}

The problem of multi-point evaluation is as follows: Given polynomial $f(x)=\sum_{i=0}^nf_ix^i$ and $v_0,v_1,\dots ,v_n$, print $f(v_0),f(v_1),\dots ,f(v_n)$.

The traditional algorithm for multi-point evaluation is that we make an almost balanced binary (sub)product tree with leaves are polynomials $p_i(x)=x-v_i$ and non-leaf nodes are the product of its children. Then we compute $f\bmod (p_1\cdots p_n)$ and then compute $f\bmod (p_1\cdots p_{n/2})$ and $f\bmod (p_{n/2+1}\cdots p_n)$ cause $(f\bmod (p_1\cdots p_n))\bmod (p_1\cdots p_{n/2})=f\bmod (p_1\cdots p_{n/2})$ (suppose $n$ is even) and then ... The transposed algorithm is that we compute $f/(p_1\cdots p_n)$ and $f/(p_1\cdots p_n)\cdot (p_{n/2+1}\cdots p_{n})=f/(p_1\cdots p_{n/2})$, finally we will have $f/p_i$ at the leaf of the tree, the coefficient of $x^{-1}$ is exactly what we want. The only remaining problem is how many terms should we track when doing the computation. It's clear that we don't care about the coefficients of $x^0,x^1,\dots$ in $f/(p_1\cdots p_n)$ cause they do nothing with the coefficient of $x^{-1},x^{-2},\dots$ when $f/(p_1\cdots p_n)$ was multiplied by a polynomial like $g_0+g_1x+g_2x^2+\cdots$. Another observation is that for $f/P\bmod 1=a_{-1}x^{-1}+a_{-2}x^{-2}+\cdots +a_{-\deg P}x^{\deg P}+\cdots$, only $a_{-1},\dots ,a_{-\deg P}$ are enough. Consider a very simple case that we have $f/(p_1p_2)\bmod 1=c_{-1}x^{-1}+c_{-2}x^{-2}+c_{-3}x^{-3}+\cdots$ and want to compute $f/p_1\bmod 1$ via multiplying $f/(p_1p_2)\bmod 1$ by $p_2$, the coefficients of $x^{-3},x^{-4},\dots$ have no effect on the coefficient of $x^{-1}$ (it's more complicated for the integer case). So we could almost halve the size of problem after one multiplication.

About how to compute $(p_1\cdots p_n)^{-1}$ in $\mathbb{R}((x^{-1}))$, we could simply use the method that we compute the multiplicative inverse of formal power series $x^{\deg(p_1\cdots p_n)}p_1(x^{-1})\cdots p_n(x^{-1})$ in $\mathbb{R}\lbrack \lbrack x\rbrack \rbrack$.  Comments (1)
| Write comment?
 » This seems like a nice (and obvious-in-hindsight) optimization over the more widely known mod-only recursion. Using a higher level of precision for the initial multiplication by $\left(\prod p_i\right)^{-1}$ seems like a very small price to pay for making every later step in the algorithm use a multiplication instead of a mod.But I don't understand in what sense it is transposed: It calculates the same things in the same order as that more widely known algorithm (albeit using a different representation), whereas transposing an algorithm (as I understand it) involves calculating related quantities but in exactly the opposite order. Am I missing something here?