GCC: deoptimize using -O2

→ Обратите внимание

До соревнования
Codeforces Round 946 (Div. 3)
3 дня
Зарегистрироваться »

→ Трансляции

aryanc403

До начала 14:15:01

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	157
6	maroonrk	155
7	-is-this-fft-	152
8	Petr	146
8	orz	146
10	BledDest	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя MaxBuzz

GCC: deoptimize using -O2

Автор MaxBuzz, 13 лет назад, По-английски

The following code produces strange results while compiling it under GCC with different optimization levels:

gcc source.cpp -> 0.440 s
gcc -O2 source.cpp -> 2.750 s (-O, -O1, -O2 the same)
gcc -Os source.cpp -> 0.223 s

For N=500, it is as follows:

gcc source.cpp -> 3.931 s
gcc -Os source.cpp -> 2.704 s
gcc -O2 source.cpp -> 42.142 s

The setup is GCC 4.4.4 on 64-bit Gentoo Linux.

Somehow optimizations by speed significantly slow the code, while optimizations by size speed it up :-)

Could anybody compile and test the code on your machines? Or, possibly, explain why it is like this?

c++, compiler options, gcc, optimization

MaxBuzz
13 лет назад
5

Комментарии (5)

Написать комментарий?

ZumZoom

13 лет назад, # |

← Rev. 2 →

GCC 3.4.2 Windows 7 x64

N = 200

gcc source.cpp -> 6.38 s
gcc -O2 source.cpp -> 5.29 s
gcc -Os source.cpp -> 1.09 s

N = 500

gcc source.cpp -> 86.21 s
gcc -O2 source.cpp -> 79.34 s
gcc -Os source.cpp -> 13.51 s

→ Ответить

Urbanowicz

13 лет назад, # |

My results are as expected:

gcc source.cpp -> 0.545s
gcc -O2 source.cpp -> 0.421s
gcc -Os source.cpp -> 0.466s

For N=500, it is as follows:

gcc source.cpp -> 5.758s
gcc -Os source.cpp ->5.138s
gcc -O2 source.cpp -> 4.949s

gcc version 4.2.1 (Apple Inc. build 5666) (dot 3)

Target platform appears to be "x86_64", but the CPU itself is 32-bit.

→ Ответить

slycelote

13 лет назад, # |

Here are my results for N=200. (gcc 4.4.3, Ubuntu 32bit).

g++ 0.899s
g++ -O2 4.733s
g++ -Os 0.413s
g++ -O2 -fno-tree-ter 0.390s

One would think that the optimization ftree-ter is broken. However it seems that it's enabled at -Os as well. In fact, the only difference in optimizations between -O2 and -Os is -finline-functions at my system. I tried turning it on, but to no effect.

Here's the relevant part of the man page:
-ftree-ter

Perform temporary expression replacement during the SSA->normal phase.  Single
use/single def temporaries are replaced at their use location with their
defining expression.  This results in non-GIMPLE code, but gives the expanders
much more complex trees to work on resulting in better RTL generation.  This is
enabled by default at -O and higher.

→ Ответить

MaxBuzz

13 лет назад, # |

← Rev. 3 →

In reply to adamax.

Probably, this is the key. Seems to be that this results in copying of strings before comparison. As you can see, the slowdown of plain -O2 seems to be not constant, but asymptotical. I will check this when I reach home.

[Update] I was telling nonsense about asymptotics.

→ Ответить

slycelote

13 лет назад, # ^ |

← Rev. 2 →

I disassembled the code of string::operator==. Turns out that in case of -O2 it uses the assembler instruction repz cmpsb, while in other cases it calls the system function memcmp. I found the description of this issue here. Quote:
"in the -O0 case, GCC relies on the implementation
of memcmp supplied with the C library. In the -O2 case, GCC instead uses its built-in implementation of memcmp. The built-in function uses the special IA-32 instruction repz cmpsb, which is known to be slow on modern hardware."
Apparently switching off builtins (-fno-builtin) should fix the issue as well.

And Bugzilla link.

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 18.05.2024 02:29:59 (l3).

Десктопная версия, переключиться на мобильную.

При поддержке