asm cheese - Codeforces

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
29:52:46
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
29:52:46
Register now »

*has extra registration

→ Streams

AMA: TheOneYouWant

By aryanc403

Before stream 05:47:44

Atcoder ABC #351 Short Solution Discussion

By aryanc403

Before stream 28:57:44

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

tumhari_mummy's blog

asm cheese

By tumhari_mummy, history, 3 years ago, In English

I recently found about the asm keyword in c++ that allows us to write inline assembly in c++. Is there any way to cheese solutions by it? I googled but they say "constants don't matter, focus on complexity.." I KNOW, THANK YOU. I want to know will it make my n^3 pass in n<=1000 asm reference

tumhari_mummy
3 years ago
5

Comments (5)

Write comment?

arcanewizard37

3 years ago, # |

Compiler already performs a lot of optimizations and ultimately C++ produces assembly code before the machine code. Writing assembly would only be viable if you have a lot of time (which you don't have in contests) AND provided that you are expert at assembly language.

→ Reply

Xellos

3 years ago, # |

Mostly not.

→ Reply

Analy7iKAlEngine

3 years ago, # |

+13

It's only marginally faster, you can choose the same speed if you use a better algorithm. Inline asm can be long to type. And difficult too, imagine doing DP and having to deal with stack & frame pointers, not fun. Do it because you want to get better at assembly not for faster cf solutions. And don't do them for contests.

→ Reply

purplesyringa

3 years ago, # |

← Rev. 2 →

No it won't. Modern compilers optimize code well enough. I sometimes rewrite my code in assembly but only gain ~1.3 speed increase. BTW, if you like such optimizations, take a look at SIMD intrinsics, they may halve the execution time in some cases.

→ Reply

nor

3 years ago, # ^ |

+13

Just for some more info about SIMD parallelism, if you write code in a compiler-friendly way (alignment of data -- as well as removing data dependencies -- in a manner that allows the compiler to optimize using SIMD parallelism), then it's quite reasonable to expect the compiler to use SIMD wherever possible, so the OP doesn't even need to worry about writing code using SIMD at all.

What follows is some general advice for squeezing in solutions within strict time limits, after you've done all possible optimizations like bitsets/replacing data structures/fast io.

What could possibly lead to a better performance boost in many practical cases would be to try to avoid branches (and thereby reducing checks) as much as possible. I would personally not recommend doing this, since one extreme of this would be using goto in your code where you are guaranteed to satisfy certain conditions, which can be confusing and harder to debug in some cases. A practical way of avoiding branches is to directly encode conditions into the answer (for a very simple example, if you want ceil(x / 2), you can try (x >> 1) + (x & 1) or (x + 1) >> 1 instead of checking for x modulo 2).

Another more important thing is the cache-friendliness of your algorithm, but unfortunately (or fortunately), most solutions to the same problem have similar cache-friendliness, so that doesn't make a lot of difference in CP, but you can try to ensure that you have the least number of cache misses possible if you're trying to squeeze in your solution. There was a discussion about the implementation of a sparse table some time ago, which showed how significant cache misses really are (most of the time taken by a program is spent waiting for the data, and L1-caches are orders of magnitude faster than L3-caches, so you can get an idea about this effect).

→ Reply