Modular inverse, Montgomery multiplication, and integer factorization

→ Обратите внимание

До соревнования
Codeforces Round 940 (Div. 2)
5 дней

→ Трансляции

CodeChef Starters 130 Solution Discussion

aryanc403

До начала 22:11:58

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	SecondThread	150
8	-is-this-fft-	149
9	pajenegod	145
10	BledDest	144

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя sslotin

Modular inverse, Montgomery multiplication, and integer factorization

Автор sslotin, 23 месяца назад, По-английски

I've been working on computational number theory for my book and recently wrote a few new articles on:

calculating modular inverse using binary exponentiation and extended Euclidean algorithm (spoiler: Euclid wins on average);
Montgomery multiplication and fast modulo / division / divisibility checks for when the divisor is constant (which can often be used to speed up modular arithmetic by 5-15x);
how to use them in various factorization algorithms (upd: fixed link) (writing a 3x-faster-than-state-of-the-art implementation of Pollard's algorithm in the process).

This is largely inspired by Jakube's tutorials and Nyaan' library implementations. Thank you!

If someone is interested, I may or may not try to optimize the sieve of Eratosthenes next: I believe it is possible to find all primes among the first $$$10^9$$$ numbers in well under a second.

sslotin
23 месяца назад
8

Комментарии (8)

Написать комментарий?

Golovanov399

23 месяца назад, # |

+11

Thank you!

First of all, your factorizations link leads to an incorrect page. Second, you write there that for integers up to $$$2^{64}$$$ it is better to use Pollard's rho, but you are aware of some advanced factorization algorithms, as you mention some of them for larger $$$n$$$. Does it mean that your implementation is faster than, for example, some implementations of SQUFOF? (here in C++ or here in C)

→ Ответить

sslotin

23 месяца назад, # ^ |

I filtered out SQUFOF early on because it has the same asymptotic complexity but looks much more complex and arithmetic-intensive.

I added dacin21's implementation to the benchmark, and it measures 425 factorizations per second, which is 7 times slower than Pollard-Brent with Montgomery multiplication. It can probably be optimized, but not by an order of magnitude.

→ Ответить

dacin21

23 месяца назад, # ^ |

The main advantage of SQUFOF was that it mostly uses 32 bit integers. Back in the days, codeforces was still using 32 bit machines, so this was great. On modern 64-bit machines however, SQUFOF greatly suffers from division operations by changing values, which rule out the typical Barrett / Montgomery tricks, and having to compute square roots.

In all honesty, I'm kinda glad I get to retire my SQUFOF code, as "fast" was just about the only positive thing I could say about it.

PS: how fast is your code compared to tinyecm.c?

→ Ответить

oversolver

23 месяца назад, # |

Finally we can remove this blog from catalog!

→ Ответить

TLE

23 месяца назад, # |

+24

If you're interested, https://loj.ac/p/6466 is a testbed for integer factorization ($$$n\le 10^{30}$$$). There are public implementations of quadratic sieve, elliptic-curve factorization and some pollard rho's :)

→ Ответить

clyring

23 месяца назад, # |

← Rev. 4 →

I'm a bit surprised that you found a division-based implementation of the extended Euclid algorithm typically faster than exponentiation-by-squaring for finding an inverse mod $$$10^9 + 7$$$ on your setup, even knowing that general 32-bit divmod isn't that slow on most hardware. But it's close and I expect with more optimizations, exponentiation-by-squaring can retake the lead. The modmuls themselves can be made perhaps 10-20% faster using relaxed Barrett reduction or the variant Montgomery reduction I discuss below, and (if you want) there are several ways to reach $$$10^9 + 5$$$ with 37 modmuls instead of 43.

long Barrett reduction comments

There's also a slightly simpler and faster relative of Montgomery reduction that uses wide multiplications. (So it sacrifices the traditional main advantage of Montgomery reduction, in exchange for being as simple and fast as Lemire reduction.) The idea is simple: wideMul(x * pinv, p) is clearly a multiple of p and its lower word is equal to x. So, its upper word must be $$$-2^{-64} \cdot x$$$.

It's my understanding that the highest-precision multiplication operands typically included in SIMD instruction sets are doubles, providing only 52+1 bits of precision. So I'm curious about your plans for using SIMD to further speed up factorization of 60-bit integers.

I tried my own hand at a somewhat optimized sieve of Eratosthenes in January. The basic optimizations of segmenting the sieve into blocks small enough to fit into the L2 cache, and skipping over all even numbers plus a dumb Haskell-specific workaround/optimization or two were enough to be able to generate and traverse the list of all primes up to a billion in about 1.7 seconds on the Codeforces servers. I expect just applying the wheel-sieve idea with primes 3 and 5 as well, and using a language where the expected output isn't 2GB-large lazy linked list would get under a second with a little room to spare.

→ Ответить

lmn207

20 месяцев назад, # |

nice rick roll =))

→ Ответить

SPyofgame

20 месяцев назад, # |

I tried to dig some research into papers and github projects before, and it seems to be somewhat possible. You can use look-up table on a Bitwise Range Segment Erastothenes Sieve with several modifications and optimizations using bit manipulation tricks. AFAIK there were some codes that can actually count prime under $$$10^9$$$ in one second using this way (but can not store them that fast, and not worth it since there are already better algorithms for counting).

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 16.04.2024 21:33:02 (j1).

Десктопная версия, переключиться на мобильную.

При поддержке