FFT Problem Optimization

→ Обратите внимание

До соревнования
Pinely Round 4 (Div. 1 + Div. 2)
02:14:18
Зарегистрироваться »

*есть доп. регистрация

→ Трансляции

Codeforces Pinely Round 4 (Div 1 + Div 2) Solution Discussion

Shayan

До начала 05:19:17

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	atcoder_official	149
6	-is-this-fft-	148
6	SecondThread	148
8	Petr	147
9	nor	144
10	cry	143

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя Diguised

FFT Problem Optimization

Автор Diguised, история, 9 лет назад, По-английски

Hi —

Recently I submit this solution to a polynomial FFT multiplication problem — POLYMUL.

Even on my computer — this solution runs very slow, and I cannot identify the reason. I'm wondering if anyone can assist in optimizing this solution — there must be something wrong for it to run so slowly.

Thanks in advance, Disguised

Diguised
9 лет назад
12

Комментарии (10)

Показать архивные | Написать комментарий?

Klein

9 лет назад, # |

The main optimization you can do, in my opinion, is to set a higher base case threshold. That is, instead of evaluating the base case when there's only a single element left, use a O(N²) evaluation when the number of elements is less than 32 (32 is usually a good threshold ^^). In my experience (not only in FFT, but also in Karatsuba and the like) this makes a huge difference.

→ Ответить

yeputons

9 лет назад, # |

← Rev. 2 →

+22

Start with getting rid of:

push_back, it's O(1), but does reallocations. Pre-allocate memory (or just use vector.reserve).
Recursive calls and allocations of memory, do everything in-place.
complex<double> — it's slow somewhy, implement same class yourself.
Replace two FFTs with one. FT of a + 0·i it's excessive, so no need to perform two separate transformations for a + 0·i and b + 0·i, do one for a + bi and then restore two results with some formulas.

I'm not sure that these are the most important, but they came to my mind first. By the way, here you can find FFT implementation from SPb SU 4's notebook.

→ Ответить

zakharvoit

9 лет назад, # ^ |

← Rev. 3 →

+16

Looking at the std::complex implementation, I noticed that there are template specializations for standard types that use complex operations from the C language. Seems that it is slow because of additional function calls.

I tested fft with the custom double "implementation" (i.e. a wrapper class around double with all operators overloaded) and std::complex and it shows the same perfomance with the custom complex class. But this implementation is even bigger than the one with the custom complex class, so it seems that it cannot be used to shorten the code.

P.S. Maybe there is another way to override template specialization than creating the custom class, but don't know it.

→ Ответить