FFT and NTT - Codeforces

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hello everybody,

It often happens that I have to multiply polynomials quickly and of course FFT is the fastest way to do so. However, I've heard that FFT can have precision errors. I'm not very familiar with the mathematical theory behind it, so I was wondering how can I know whether precision errors will occur and what to do to avoid them. I've also heard about NTT (Number Theoretic Transform) which is similar to FFT but it works with integers only (using modular arithmetics?) but I don't know much about it.

So my questions are :

1) How can I know whether regular FFT using complex numbers would give precise errors, what does it depend on?

2) Is there somewhere I can read about NTT and how it works. I don't have strong mathematical background so something more competitive-programming oriented would be perfect. Additionally, is NTT always precise?

3) Could someone provide quick and simple implementations of FFT/Karatsuba/NTT of simple polynomial multiplication? I've written FFT and Karatsuba before, but my implementations were so bad that they were often outperformed by well-written quadratic solutions.

I do know that these questions were probably asked before so I apologize beforehand. I tried to google but most things I found were math-oriented.

Thanks in advance :)

Comments (24)

Write comment?

mathlover

9 years ago, # |

+16

wiki may help you:[DFT](https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)#Number-theoretic_transform) In this wiki, you can see DFT, FFT, NTT. I think it will give you some benefit.

→ Reply

SuprDewd

← Rev. 2 →

+34

I thought this was a nice explanation of NTT: http://www.apfloat.org/ntt.html

Regarding multiplication with FFT, you can take a look at this, which I wrote for problems MUL and VFMUL.

danikhil

4 years ago, # ^ |

+35

Your code is very difficult to read.

TimonKnigge

How can I know whether regular FFT using complex numbers would give precise errors, what does it depend on?

As a followup, has anyone ever actually had precision problems with their FFT implementation? I've solved quite a bit of problems with mine but I've never had any precision problems. I'm wondering if there are actually people that encountered this problem, or whether this is just a 'fact' that has seeped in from the academic world, where it is more relevant.

Xellos

9 years ago, # ^ |

+19

Yes. Double has limited precision, so does long double; if you try to make a convolution of 2 arrays with size 10⁵ and numbers up to 10⁹, you'll get numbers with something like 18 correct digits and 5 random ones at the end.

+18

how can I know whether precision errors will occur

That's quite simple. Just think about what (complex) FFT computes, what you do to get integers out of it and precision of (long )double. Rule of thumb: modulo size is what matters, not array size.

what to do to avoid them

Not use FFT, lol. Karatsuba or NTT with multiple modulos (in case the one you want isn't good enough) plus Chinese Remainder Theorem.

Additionally, is NTT always precise?

It works with integers, sure it is.

I've written FFT and Karatsuba before, but my implementations were so bad that they were often outperformed by well-written quadratic solutions.

Possible optimisations of Karatsuba: pass references + subarray bounds (convolution(A,B) does convolution(subarray of A,subarray of B) twice, for which you don't need any copying); avoid moduling and subtract instead; therefore, use ints; if you have small arrays, try naive multiplication; that can't be done in ints, but you can keep all numbers of the convolution array somrthing like < 5mod² and only subtract 5mod² when necessary. Loop unrolling might help as well.

Some implementations of FFT/NTT should be at e-maxx, and they're fast.

Of course, NTT works with primes such that 2^x divides p - 1 and 2^x is more than your array size. Even if you're not given such a modulo (or aren't given any modulo at all), such primes are quite common — if the resulting values of the convolution fit into int, you can choose such a prime and do NTT. Advantage: no precision errors.

Enchom

So say I have to find the convolution of arrays with size 10⁵ and I have to do that modulo 10⁹ + 7. Seems like NTT is the best way to go. The most straight-forward way seems to be to compute it with 3 (?) different primes and then use Chinese Remainder Theorem. However as the values themselves may overflow long long I'd have to use big integers for CRT. Is there any more elegant way to get everything modulo 10⁹ + 7 without coding big numbers?

Bigints shouldn't be a problem if you can do NTT... and it's not like it costs a lot of runtime.

Okay, thanks a lot for the help. Finally I wanted to ask is the speed of NTT significantly slower than FFT because in the end of this article they mention it is slower, but I have no idea how much slower.

Alex_2oo8

However as the values themselves may overflow long long I'd have to use big integers for CRT. Is there any more elegant way to get everything modulo 10⁹ + 7 without coding big numbers?

Sure, when restoring the actual values by CRT you just do all the calculations modulo 10⁹ + 7 and that's it. What's the difference between [do some additions and multiplications (and we only need those to restore the values by CRT) and then take the result modulo 10⁹ + 7] and [do everything modulo 10⁹ + 7 from the very beginning]? Right, there is no difference :)

Well I was looking at it here and it seemed like you have to perform division too.

Edit: Nevermind, I guess we could also divide modulo the prime.. :D

This might be helpful. You don't need big integers there until the very end, where you perform only additions and multiplications.

cjtoribio

8 years ago, # ^ |

+15

Actually, there is a nicer solution without CRT and BigInteger. Think about spliting each number in upper and lower half. Then use something very similar to karatsuba. By only splitting you might get it with 4 convolutions so still O(NlogN) but you can get it down to 3 applying the same concept as Karatsuba. I solved a problem in NlogNlogN using it. And it that 4 to 3 optimization did have noticeable effect.

About splitting it is that you choose something like sqrt(MOD) and split both arrays in higher bits and lower bits. Then use FFT without losing presicion since numbers are of size at most sqrt(MOD). The merge them and there you go.

ffao

This algorithm looks to be exactly what Lien posted below.

The CRT method doesn't need BigIntegers, and when I tried to program CRT it was slightly faster than the code linked to below. Maybe your Karatsuba implementation is faster than that one, FFTs vary a lot...

Oh, lol very true. I just figured it out in the Hackerrank that is running. Could you provide a code or submission for the CRT approach i've never used that. I still don't understand very well the idea. And yes now i tested it, and that implementations i kind of slower than mine, athough not for much.

There's not much to it, really; you just do the multiplication with 3 different mods and join them afterwards using CRT. Here's a submission that uses it:

https://www.hackerrank.com/rest/contests/w23/challenges/sasha-and-swaps-ii/hackers/ffao/download_solution

Swistakk

"Access Denied"

You need to solve it first to see it.

purplesyringa

5 months ago, # ^ |

Sorry for necroposting, but have you ever managed to make this work or was this just an idea?

CRT for

$$$ x \equiv r_1 \pmod{m_1}, \\ \dots \\ x \equiv r_n \pmod{m_n} $$$

effectively computes

$$$ x \equiv r_1 \alpha_1 + \dots + r_n \alpha_n \pmod{m_1 \dots m_n} $$$

for various precomputed $$$\alpha_i$$$, but the naively computed sum $$$r_1 \alpha_1 + \dots + r_n \alpha_n$$$ might exceed $$$m_1 \dots m_n$$$, which we won't be able to notice and fix if we perform multiplication and addition modulo a smaller number than $$$m_1 \dots m_n$$$ (e.g. $$$10^9 + 7$$$) in this case, no?

I think you should have no problems like this if you use Garner's algorithm.

Yeah, indeed that seems to work. Thanks!

ifsmirnov

+61

There is an easy technique I always use when writing FFT to check precision.

res[i] = (int)(a[i].real() + 0.5)
error = max(error, abs(a[i].real() - (int)(a[i].real() + 0.5)))

This way you can tune your constants, and when error < 1e-3` you can be sure that all rounding is correct.

BTW it can be also used in some geometry problems, although not always that trivially.

Klein

Section 10.4 (page 75) and especially section 10.4.2 (page 76) in this document Johan Håstad's notes for the course Advanced Algorithms might shed some light on what you want to know, though the explanations are quite mathematical...

Lien

← Rev. 3 →

Enchom I found on Petr's blog a more convenient way to take FFT mod an arbitrary number. View the comments in Russian mode. For short, the code.

Enchom's blog