#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
7	SecondThread	150
9	orz	146
10	pajenegod	145

In this blog you can found an efficient implementation of Karatsuba multiply of two polinomials, that sometimes can be used instead of FFT

$$$\text{ }$$$

Hello everyone!

One man ask me to write a code for calculation of $$$1.000.000$$$ fibonacci number in C++. This broblem has been solved in time less than 1 second with Karatsuba multiply. After it, I tried to use this code in standart problems like "we have two strings of letters $$$\text{ATGC}$$$ and want to calculate optimal cyclic rotation of one of them so number of equal positions will be greatest" with Karatsuba multiply, and these problems have been successfully solved. There are a lot of naive implementations, but MrDindows helped to write a most efficient of all simple implementations.

Idea of Karatsuba multiply

We have two polinomials $$$a(x)$$$ and $$$b(x)$$$ with equal length $$$2n$$$ and want multiply them. Let suppose that $$$a(x) = a_0(x) + x^n \cdot a_1(x)$$$ and $$$b(x) = b_0(x) + x^n \cdot b_1(x)$$$. Now we can calculate result of $$$a(x) \cdot b(x)$$$ in next steps:

Calculate $$$E(x) = (a_0(x) + a_1(x)) \cdot (b_0(x) + b_1(x))$$$
Calculate $$$r_0(x)=a_0(x)\cdot b_0(x)$$$
Calculate $$$r_2(x)=a_1(x) \cdot b_1(x)$$$
Now, result is $$$a(x) \cdot b(x) = r_0(x) + x^n \cdot \left(E(x) - r_0(x) - r_2(x)\right) + x^{2n} \cdot r_2(x)$$$

We can see, that instead of $$$4$$$ multiplications we used only $$$3$$$, so we have time $$$O(n^{\log_2(3)})$$$ or $$$O(n^{1.58})$$$

Efficient implementation

#pragma GCC optimize("Ofast,unroll-loops")
#pragma GCC target("avx,avx2,fma") 
namespace {
    template<int n, typename T>
    void mult(const T *__restrict a, const T *__restrict b, T *__restrict res) {
        if (n <= 64) { // if length is small then naive multiplication if faster
            for (int i = 0; i < n; i++) {
                for (int j = 0; j < n; j++) {
                    res[i + j] += a[i] * b[j];
                }
            }
        } else {
            const int mid = n / 2;
            alignas(64) T btmp[n], E[n] = {};
            auto atmp = btmp + mid;
            for (int i = 0; i < mid; i++) {
                atmp[i] = a[i] + a[i + mid]; // atmp(x) - sum of two halfs a(x)
                btmp[i] = b[i] + b[i + mid]; // btmp(x) - sum of two halfs b(x)
            }
            mult<mid>(atmp, btmp, E); // Calculate E(x) = (alow(x) + ahigh(x)) * (blow(x) + bhigh(x))
            mult<mid>(a + 0, b + 0, res); // Calculate rlow(x) = alow(x) * blow(x)
            mult<mid>(a + mid, b + mid, res + n); // Calculate rhigh(x) = ahigh(x) * bhigh(x)
            for (int i = 0; i < mid; i++) { // Then, calculate rmid(x) = E(x) - rlow(x) - rhigh(x) and write in memory
                const auto tmp = res[i + mid];
                res[i + mid] += E[i] - res[i] - res[i + 2 * mid];
                res[i + 2 * mid] += E[i + mid] - tmp - res[i + 3 * mid];
            }
        }
    }
}

Testing

Example of solutions for real problems: 528D. Fuzzy Search and AtCoder Beginner Contest 149 E. Handshake.

Stress-test 512K for 64-bit and 32-bit coefficients

It is possible that this code can be used instead of FFT, but be careful: for multiplication of two polinomials with length $$$2^{19}$$$ Karatsuba multiply requires $$$3.047.478.784$$$ operations. Of cource, it can be executed in 1 second, but for greater length...

Rev.	By	When	Δ	Comment
en2	dmkz	2020-02-24 12:39:36	1	Tiny change: 'l position will be g' -> 'l positions will be g'
ru3	dmkz	2020-02-24 12:38:29	79
en1	dmkz	2020-02-24 12:36:53	5579	Initial revision for English translation
ru2	dmkz	2020-02-24 12:07:38	9	Мелкая правка: '(a_0(x) + b_0(x)) \cdot (a_1(x) + b_1(' -> '(a_0(x) + a_1(x)) \cdot (b_0(x) + b_1('
ru1	dmkz	2020-02-24 12:05:14	5530	Первая редакция (опубликовано)

Idea of Karatsuba multiply

Efficient implementation

Testing

History