Discussion: Modern G++ Command Line

3 года назад, # ^ |

+29

if (source.contains("#pragma")) {
  return COMPILATION_ERROR;
}

→ Ответить

SkyDreams

3 года назад, # ^ |

Where do we need to write this ?

→ Ответить

3 года назад, # ^ |

+14

Mike can write this somewhere inside Codeforces code

→ Ответить

arpit0891

3 года назад, # ^ |

-15

Agreed ...

→ Ответить

arpit0891

3 года назад, # ^ |

Different optimizations you can refer to this .

→ Ответить

iensen

3 года назад, # ^ |

+14

I don't think this works. One can use _Pragma ("GCC optimize(3)") instead.

→ Ответить

pshevchuk

3 года назад, # ^ |

Or you can use "??=" as a synonym for # and write ??=pragma

→ Ответить

3 года назад, # ^ |

+68

Then the contestants will just use function attributes, which have exactly the same effect as pragmas, but for individual functions:

Spoiler

#define fast_function \
  __attribute__((__optimize__("O3,unroll-loops"))) \
  __attribute__((__target__("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")))

int fast_function solve(int n) { ... }

Here's an example: 128071115. The fast_function define can be also obfuscated a little bit to avoid trivial regexp based detection. There may be other loopholes.

Disabling AVX instructions support on the judge machines may be a useful equalizer option (AVX is the biggest contributor when the pragmas help). Some people on the Internet are saying that disabling AVX is possible in Windows: https://newbedev.com/how-can-i-check-whether-intel-s-avx-is-enabled-on-my-computer

→ Ответить

swiftc

3 года назад, # ^ |

← Rev. 2 →

+38

Actually, we can disable #pragmas at compiler level, like this, but I think it's not necessary.

→ Ответить

Wind_Eagle

3 года назад, # |

← Rev. 2 →

+70

I think that -O2 is a fine standard. For example, in Belarusian OI we have -O2 as a standard.

One can use pragmas to get -O3, -funroll-loops and other features.

→ Ответить

Kevin114514

3 года назад, # |

-8

what you need to do is

#pragma GCC optimize(3)
#pragma GCC optimize("inline")
#pragma GCC optimize("-fgcse")
#pragma GCC target("avx","sse2")
#pragma GCC optimize("-fgcse-lm")
#pragma GCC optimize("-fipa-sra")
#pragma GCC optimize("-ftree-pre")
#pragma GCC optimize("-ftree-vrp")
#pragma GCC optimize("-fpeephole2")
#pragma GCC optimize("-ffast-math")
#pragma GCC optimize("-fsched-spec")
#pragma GCC optimize("unroll-loops")

→ Ответить

lis05

3 года назад, # ^ |

could you please explain what each of them does?

→ Ответить

tmaddy

3 года назад, # ^ |

-77

no-one cares

→ Ответить

lis05

3 года назад, # ^ |

-19

fine

→ Ответить

CartesianTree

3 года назад, # |

← Rev. 2 →

-13

Deleted.

→ Ответить

2_3_3

3 года назад, # |

+89

The optimization is really a good news for C++ programmers.

But I have seen that sometimes it can give negative-optimization.

So I suggest that users can decide to use it or not by themselves.

→ Ответить

jeal0uspengu1n

3 года назад, # |

+10

Ahhh!!! c++20 noice noice

→ Ответить

Kyou_mo_kawaii

3 года назад, # |

← Rev. 2 →

+127

Xellos claims that O3 can sometimes be slower than O2 in https://codeforces.com/blog/entry/66279. That thread is also a great resource for pragmas.

I think rather than asking people you can do this more scientifically by rerunning existing submissions under both the old and new flags. It will be interesting to see the stats of how many AC will become MLE or WA(due to some UB being exposed?) and which types of problems benefit the most from O3/unroll and which ones it will actually slow down. It will make an interesting blog post if you get those stats!

→ Ответить

Time_To_Night_Sky

3 года назад, # |

← Rev. 2 →

-38

[Deleted]

→ Ответить

3 года назад, # ^ |

+93

It's wrong constraints, not wrong solution. Who to blame is the author.

→ Ответить

LeoPro

3 года назад, # ^ |

+58

Yeah, you kinda right. We are suggested a problem, we have to implement a solution, which works correctly and fits into the time (and memory) limit. But what in fact do the contest authors want?

The correctness of algorithm is checked using testcases. The time and memory limit check, how good is the solution. Very often, the quality is just the complexity. The editorials show us the intended complexity. They do not show neither intended language, nor intended pragmas.

Apparently, the main thing is not to allow too slow — incorrect — solutions to pass. That is why time limits can be tight. Then it can be difficult to make accepted a solution with correct complexity — particularly if it has big constant factor (say, written not in C++). But tight limits are discouraged in codeforces.

You are suggested a list of languages you can write a solution in. I believe that they are alternatives — we can pick any of them, translate our correct idea and get accepted. I believe that choosing C++ isn't a must (in perfect world). The correct idea must be accepted, whenever it is written in C++ or not. But the languages differ in the speed. The way to check the asymptotics is to set enormous limits — say, $$$n \le 10^9$$$ for $$$O(n\log{n})$$$ algorithm and about 300 seconds of time limit. But it is impossible.

Consider some example. Problemsetters decide that solution must work in $$$O(n\log^2{n})$$$ and $$$O(n^2/w)$$$ should not pass. They implement the best solution for $$$O(n^2/w)$$$, common solution for $$$O(n\log^2{n})$$$ and set time limit somewhere between their ruinning times. The more the gap is, the more other language or non-optimal solutions are allowed. Forbidding to use pragmas is increasing the size of the gap. It simplifies work for problemsetters and increases probability of receiving good problem. Yes, it's their fault, but do you want to help them preparing high quality problems or not?

P. S. Yes, I used pragmas three or four times and all times the program worked slower. So you can read my comment as " I do not know, where (and how) should I use pragmas. Please, forbid all the contestants to use them!". And to add up, putting some random lines in the beginning of the file to make the program faster hardly correlate in my mind with thinking about best solution for the problem.

→ Ответить

Time_To_Night_Sky

3 года назад, # ^ |

← Rev. 2 →

-16

[Deleted]

→ Ответить

Lyrically

3 года назад, # |

Looking forward to the release of the new standard of C++.

→ Ответить

Sparky_Master_WCH1226

3 года назад, # |

Wow, thanks! That is a very useful and meaningful idea.

→ Ответить

FrozenKandy

3 года назад, # |

pypy3 64 bit is coming!! I couldn't be happier :D

→ Ответить

prabhav_

3 года назад, # |

Psyched for C++ 20 :D

→ Ответить

3 года назад, # |

Sometimes (for example, when there are no autovectorized loops), pragmas can be useless and even slow down the program (I had some examples of it). Maybe there are some that never slow down?

→ Ответить

Marslai24

3 года назад, # |

I think the optimization of compiler would be great, but on the other hand, this would give less pleasure in optimizing the code itself. By the way, thanks for the update!

→ Ответить

hbarp

3 года назад, # |

-26

Probably the wrong place to ask this but I dont know where to ask so sorry for it :)

Are there any upcoming lectures for EDU section ?

→ Ответить

christchurch

3 года назад, # |

-39

O2 is enough.A Chinese oj called POJ did not even turn on o2 optimization，and only supports the C98 standard QAQ

→ Ответить

oneninetyfive

3 года назад, # ^ |

+16

POJ might be the single worst online judge that exists today... The website seems like it is stuck in the 1990s.

→ Ответить

purplesyringa

3 года назад, # |

-9

gcc11 std=c++20

What about c++2b? That would be even better.

→ Ответить

3 года назад, # ^ |

As far as I know, c++20 itself isn't fully supported in any compiler as of yet. It'll be a long way to getting it to work on cf anyway, so c++2b doesn't make sense.

If your comment was sarcasm, well...

→ Ответить

purplesyringa

3 года назад, # ^ |

← Rev. 2 →

It wasn't sarcastic in the slightest.

C++20 is not fully supported, right. C++23 neither. That does not mean we can't use features that are supported, however, does it?

Also, I'm not sure why making C++20 work would be difficult. It only requires updating the compiler, as far as I am aware.

→ Ответить

3 года назад, # ^ |

← Rev. 2 →

+26

Ah I see. The rest of the comment will reference this.

My point about C++20 was that if compilers don't fully support all features, it might be a great surprise for some people if some features won't work on CF (but work locally). For instance, this post: https://codeforces.com/blog/entry/94457, where even C++17 features don't work, since the version of GCC is 9.2.0, while most people on Ubuntu use 9.3.0 (and people on rolling-release distros use much more recent versions, like 11.1.0). However, the majority of "good" C++20 features have already been implemented by GCC, so it might be less of an issue for C++20. Some good features (like constexpr std::string and std::vector) still haven't been implemented on GCC though, so I would still prefer waiting for them to be implemented.

The problem would be worse for C++23 (since most C++23 features in the STL aren't implemented in compilers anyway), so people who have updated compilers might face unexpected issues in due course of time as the compiler gets outdated.

As far as updating the compiler is concerned, I suppose it takes some non-trivial amount of effort which might lead to maintenance delays (even for half an hour, that's quite a bit of a delay). I don't think CF would want to follow a rolling-release pattern and update compilers as and when more features are implemented, which would keep us stuck with compilers that don't fully implement the standard (as in the case of 9.2.0). I hope that clears up the reasons behind my concerns.

→ Ответить

Ritwin

3 года назад, # ^ |

-10

I think a lot of people will be happy to use set.contains(x) though (And no, I don't know why it took this long to add a .contains method)

→ Ответить

purplesyringa

3 года назад, # ^ |

+11

GCC is known to support newer language features even when the selected standard is quite old. Structured buildings, for instance, supported even with -std=c++11 option--GCC will still compile them and only raise a warning.

So why don't we do a similar thing here? Make a compiler called 'GCC (C++20, partial C++2b support)', or maybe even 'GCC (C++20)' if you think minor lies are allowable, that uses the -std=c++2b CLI option? In this case, if you use C++20 features only, you're in the clear. And if you use newer features, you'll either get an error because the feature was not implemented yet, or it will compile correctly and you won't lose precious seconds rewriting the code.

It's a win-win situation. Participants that are aware of this little trick can benefit from it. Participants that aren't will get the same results as before, except they get more lucky, so to say, in pushing the boundaries, if they accidentally use too new features.

As far as updating the compiler is concerned, I suppose it takes some non-trivial amount of effort which might lead to maintenance delays (even for half an hour, that's quite a bit of a delay).

Do you think the only problem is the updating difficulty, and that if it was automated, it'd work out?

→ Ответить

3 года назад, # ^ |

+11

Sure, that can be done I guess (not sure about all the technical details about deploying these updates on CF though).

It's just a matter of personal taste to wait for a more polished implementation of C++20 (or maybe even C++23). However, if there is a guarantee that compiler updates can happen frequently enough without breaking stuff very frequently, I would go with the newer compilers too, since that mimics my own update cycle as well. Note that there tend to be bugs in new releases of compilers too, so that might be a minor inconvenience here and there.

→ Ответить

depressed_ontop

3 года назад, # |

+24

pypy3 64-bit ,now python users don't have to switch in python and pypy. Now we are going to use pypy3 always.

→ Ответить

sky123

3 года назад, # |

I think this is good, because it will reduce the impact of reasons other than the complexity of the program on the judge results. :)

→ Ответить

ilyaleshchik

3 года назад, # |

I think this option will be good as a separate compiler option when you submit a task.

→ Ответить

oversolver

3 года назад, # |

only if -O3 is necessary to beat Rust-ers

→ Ответить

nikgaevoy

3 года назад, # ^ |

I understand that it was a joke, but I still want to answer :)

From my experience, Rust standard library is incredibly slow even in comparison with -O2 version of STL. At least BTreeSet, which seems to be the counterpart to std::set in Rust.

Would be happy to hear from Rust experts that I am wrong.

→ Ответить

avm

3 года назад, # ^ |

+10

Let's insert 5000000 32-bit integers into sorted set and compute some polynomial hash.

Rust 128344948 submission runs 1637 ms and used 49200 Kb memory.
GNU C++17 (64) 128344998 submission runs 5256 ms and used 239500 Kb memory.

Why this happen and why BTreeSet is slower than red-black tree for small sets you could read in the man.

→ Ответить

CodingKnight

3 года назад, # |

According to the following blog, Speed up Code executions with help of Pragma in C/C++, the optimization flag -Ofast is superset of -O3, and its performance is slightly better than -O3.

→ Ответить

3 года назад, # ^ |

This blog is misleading about the -Ofast option and doesn't mention the downsides. This is what the official documentation says: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.

If people don't mind deviating from strict standards compliance, then they can always use pragmas to enforce -Ofast for their code. But everyone else would be rightfully upset if their valid code doesn't behave as expected, especially if this results in a WA verdict.

→ Ответить

CodingKnight

3 года назад, # ^ |

Thanks for sharing the gcc link. It seems that I was lucky enough to not get WA verdict when I tried this pragma optimization flag.

→ Ответить

sanyacoder

3 года назад, # |

-119

This is a great idea except for one thing. I am strongly against -O2 and -O3 keys in command line. (Also sometimes it is not fair for people not using C++)

I have an example:

test.cpp:

#include <iostream>
using namespace std;
int main() {
    int t = 1;
    while (t > 0) {
        t *= 9;
        cout << t << '\n';
    }
    cout << "finished\n";
}

Then run g++ test.cpp -O1 -o test && ./test in command shell. I got such result:

But if I run g++ test.cpp -O2 -o test && ./test or g++ test.cpp -O3 -o test && ./test, I will get a program that goes into an infinite loop.

A part of it's output:

291559329
-1670933335
2141469169

So, I think -O1 would be better for the command line.

Possible explanation (I'm not an expert):
Unsigned comparison is a bit faster than signed because it is not just a "stupid" comparison. Processor needs to look at first bit, than choose how it will compare numbers (it depends on first bit), еtс. Compiler does not consider the context that it can be a negative number and just uses unsigned comparation to imporove speed.

→ Ответить

webmaster

3 года назад, # ^ |

+79

Overflow of signed integers is undefined behaviour in C++. That's why your program gets stuck in O2/O3.

→ Ответить

3 года назад, # ^ |

+14

Your code is just incorrect. Add -fsanitize=undefined option to the gcc command line when compiling, run your program and it will tell you what's wrong. Yes, C++ is a very difficult programming language and it's often criticized for that. If you want an infinite loop and no unexpected surprises, then you can use Python.

→ Ответить

tfg

3 года назад, # ^ |

+11

I'd say calling a function that doesn't exists somewhere and not having that raising an error until the function is called or having a typo in a variable name and the program not accusing that of being an error is an unexpected surprise by itself.

→ Ответить

3 года назад, # ^ |

Yet I don't see any Python users posting blogs about such errors and being unable to troubleshoot them. The biggest source of actual confusion may be something like a somewhat unpredictable performance of string concatenation in CPython. There's also Rust for those, who prefer statically typed programming languages and catching more errors at the compilation stage.

→ Ответить

nikgaevoy

3 года назад, # |

+11

As far as I understand, -O3 can be faster and also can be slower than -O2, but it is more often faster than slower. The only reason why the standard compilation line for big competitions (such as ICPC) contains -O2 is that -O3 allows the compiler to do some optimizations that are not present in the C++ standard (or maybe even do something contradicting the standard, assuming it does not change the program behaviour, but I am not sure about that).

On the other hand, -O2 is guaranteed to perform almost all optimizations from the standard, so it is usually enough for any reasonable solution, and you always can add pragmas whenever you specifically need -O3.

Anyway, it is impossible to make the best command line to deal with pragmas once and forever, because there always will be something like sse or avx, that is not added by -march=native -mtune=native, but still can be added manually via pragmas and make some difference in running time.

→ Ответить

3 года назад, # ^ |

In my experience it's more often slower than faster since regular CP code isn't bottlenecked by how instructions are generated.

→ Ответить

nikgaevoy

3 года назад, # ^ |

To be very honest, I think it is more often more or less equal. Also, I bet that it depends not only on the problem but also on your preferred code style. E.g. I tend to write very template-heavy code that relies on inline optimizations that are different in -O2 and -O3 (according to godbolt). I can't quite tell which one is better though.

In general, I think that if the optimizations went that far, then you better simply check both -O2 and -O3.

→ Ответить

3 года назад, # ^ |

I do check, it's how I know. However, even if it was 50/50 on which flag gives better results, then O2 is the one to use as the default.

I can't quite tell which one is better though.

That's the thing. You get different code, that doesn't mean you get faster code. It should depend on problem type rather than code style though (unless we're talking about bad code style that needs O3), for example randomly accessing memory is bottlenecked by cache misses and no amount of compiler optimisation can change that.

→ Ответить

3 года назад, # |

+24

Great resource: https://wiki.gentoo.org/wiki/GCC_optimization.

Basically, feel free to use machine-specific options (-march=native is enough) since that's where the code will run, but keep optimisation to -O2.

-ftree-vectorize is an optimization option (default at -O3 and -Ofast), which attempts to vectorize loops using the selected ISA if possible. The reason it isn't enabled at -O2 is that it doesn't always improve code, it can make code slower as well, and usually makes the code larger; it really depends on the loop etc.
On an Intel/AMD64 platform with -march=native -O2 or lower optimization level, the code will likely end up with AVX instructions used but using shorter SSE XMM registers. To take full advantage of AVX YMM registers, the -ftree-vectorize, -O3 or -Ofast options should be used as well
Also available are the -mtune and -mcpu flags. These flags are normally only used when there is no available -march option
Even the GCC manual says that using -funroll-loops and -funroll-all-loops will make code larger and run more slowly.
Stick to the basics: -march, -O, and -pipe.

Note -pipe which just makes compilation faster but that's also useful.

→ Ответить

3 года назад, # ^ |

Here is an alternative opinion from another Linux distro: https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html

Usually we recommend using -O2. This is the optimization level we use to build most SUSE and openSUSE packages, because at this level the compiler makes balanced size and speed trade-offs when building a general-purpose operating system. However, we suggest using -O3 if you know that your project is compute-intensive and is either small or an important part of your actual workload. Moreover, if the compiled code contains performance-critical floating-point operations, we strongly advise that you investigate whether -ffast-math or any of the fine-grained options it implies can be safely used.

Looks like you are right and -march=native also implies -mtune=native at least for the x86 target. As for -funroll-loops/-funroll-all-loops, GCC documentation says that:

Spoiler

It's important to note that popular open source software typically has a long history. Years had been used to iron out bugs and improve performance. Many of the performance critical parts had been already identified and optimized using hand written assembly or intrinsics. Many loops that could actually benefit from unrolling, had been unrolled manually. There are relatively few low hanging fruits left for the compilers. This limits the usefulness of -ftree-vectorize and -funroll-loops optimization options when used with such code.

But competitive programming solutions are usually small, somewhat messy and there's not much time available for doing manual loops unrolling due to short contests duration. This provides more optimization opportunities for the compiler. Pre-written library code is the only exception.

→ Ответить

3 года назад, # ^ |

Keep in mind that compute-intensive in OS applications usually means things like video processing, not complex algorithms. The vast majority doesn't have performance-critical parts at all.

and there's not much time available for doing manual loops unrolling due to short contests duration

The tradeoff isn't simply "no time to do this", it's "no time to do this probably useless thing". Copypasting the insides of a loop and renaming variables isn't hard when you know it's useful, but when you don't know, there's a good chance it's not useful.

→ Ответить

3 года назад, # ^ |

← Rev. 2 →

Manual loop unrolling doesn't make any sense anymore, unless it is a non-trivial rearrangement of program logic that the compiler can't detect. At O3, the compiler is smart enough to unroll loops which it deems fit conservatively (for instance, this: https://godbolt.org/z/hzKvnsGfz — try it without -funroll-loops).

-funroll-loops usually leads to a better performance (especially if it's a lightweight tight loop), but again, compiler options don't always do what you want them to do, and this doesn't unroll all loops either, since the unrolling is heuristic-based. Compilers might even reroll your loops if they find that some of your loops are hand-unrolled and are pretty bad due to the absence of such a guarantee (this is just a hypothetical).

Compilers are smart enough at micro-optimizations (more so than most programmers), so I believe we shouldn't rule out the usefulness of loop-rolling that the compiler does.

→ Ответить

3 года назад, # ^ |

+21

This is probably the most sane thing to do.

I would however add that if -march=native isn't set already, then bmi/bmi2 (if not that, then popcnt, lzcnt, tzcnt) should be considered for non-trivial bit operations (like __builtin_popcount, __builtin_ctz, __builtin_clz, __lg and so on). These would be pretty helpful in doing these bitwise operations and also speed up bitsets a bit.

Ofast is unsafe for floating point operations (I remember someone saying that it breaks some solutions to geometry problems, which was the biggest reason I stopped using Ofast), and O3 might make code slower (from my experience on CF as well, but I tend to include it anyway since it happens rarely). One good thing about O3 (as pointed out above) is the larger bit vectors that you get (but you can do that by enabling AVX2 anyway).

The point is, optimized code should not be necessary to pass, although it's nice to have.

-march=native or -mtune=native being the only addition to the compiler options should be fine in my opinion. The reason why it's not used in production is that people want to build for other architectures too, but since all code compilation and execution happens on one system in the case of competitive programming, it makes sense to actually get as much performance out of it as possible. Some might argue it's even the best thing, since it allows you to use instructions from most ISAs that are supported, which makes it fair game for people who don't actually want to use pragmas (and for people who are afraid of using pragmas due to the fear that their submission will get an RE due to SIGILL).

I would actually be interested in the flags that are supported on codeforces (the flags turned on by -march=native can be found by running g++ -march=native -E -v - </dev/null 2>&1 | grep cc1). For instance, my machine gives the following output.

Output

Still, in an ideal scenario, such optimization options shouldn't be involved (and perhaps be banned from code too), but cheesing problems using SIMD is way too fun.

→ Ответить

ggdwbg

3 года назад, # ^ |

There's actually another way to get free performance: write high level idiomatic code and constexpr'ing as much as you can.

Better to show with an example: https://gcc.godbolt.org/z/YocGTro41

High level doesn't mean slow. Example above is overly high-level, it computes some nontrivial thing (sum of all node labels in some implicitly defined graph that is not stored in memory) and it compiles in a single machine instruction that just moves the result into the register.

You can do memory allocations for no cost (one can think about memory for constants/precomputed stuff as registers); nasty cases/whatever can be elegantly expressed in this way.

Not sure if this is common knowledge in CP.

→ Ответить