KACTL ModMul. It says that it runs around 2x faster than naive (__int128_t)a * b % M. When I ran my benchmarks with -O2, the results were similar. Am I mistaken?
Doubt in KACTL ModMul
KACTL ModMul. It says that it runs around 2x faster than naive (__int128_t)a * b % M. When I ran my benchmarks with -O2, the results were similar. Am I mistaken?