Goodbye2022G C++ 64-bit vs 32-bit

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

I was attempting to implement the solution to 1770G - Koxia and Bracket after reading the editorial recently but I kept getting TLE on test case 12. However, after changing the compiler from GNU C++17 to GNU C++17 (64), I got AC in 1185ms, which is very far off from the time limit of 5 seconds.

GNU C++17: 188211296

GNU C++17 (64): 188211331

I tried testing it on errorgorn solution in the editorial as well and there was the same problem.

GNU C++17: 188211673

GNU C++17 (64): 188211705

I thought that it might be because of the NTT implementation that we used (KACTL), however, I even tried on Radewoosh submission which uses a different NTT implementation, but still faces the same issue.

GNU C++17: 188203821

GNU C++20 (64): 187349301

I have seen some cases where 64-bit compiler runs faster than 32-bit before, but never to such a large extent. Does anyone know the reason why? Does NTT run a lot faster on 64-bit compiler? Or is it something about our implementation?

If anyone know the reason why, I will appreciate it very much if you could explain it to me down in the comments. I guess it is about time that I shift from GNU C++17 to GNU C++17 (64) 😢

Comments (2)

Write comment?

oversolver

16 months ago, # |

-8

I wrote editorial's code with my library and have

GNU C++17 (64): OK 1419 ms 188256118

GNU C++17: OK 3884 ms 188256187

I guess modular arithmetic is bottleneck

→ Reply

digitcrusher

From my experience of solving problems on a 32-bit-only online judge, I can say that 64-bit arithmetic on 32-bit machines is indeed very costly. On such machines, 64-bit integers are split into two 32-bit ones and arithmetic on them is emulated using multiple 32-bit operations, which is not always the easiest task, especially when the operations intertwine the low and high parts of a number as it is with multiplication and division/modulo. Here's MSVC's implementation of 64-bit division using 32 bits and as you can see, it uses many different processor instructions, including branching which has its own performance problems, compared to just one "idiv" instruction you'd have on a 64-bit processor. In short, just use 64-bit compilers whenever you can to get the most out of your computer's capabilities.

maomao90's blog