I was looking for a practical guide on how to actually apply time complexity to know what will TLE or not and found nothing. Mostly what's out there are charts on what time complexities are reasonable given the value of n, but that's a heavy oversimplification. In this guide I will try to explain how to estimate the running time of your code.

First of all, let's understand what time complexity actually means. Formal definitions aside, we can say that if a code is O(f(n)), the time consumption of that code should be something like C*f(n)+S where C is a constant and S is something small compared to the rest.

So let's say your code consists of reading n integers and then iterating through all pairs of them. Your time consumption should be something like n + n^2, for reading and going through the pairs, respectively. The notion that time complexity gives us is that if your code is too slow, it is probably because of the n^2 bit, not the n one. That's why we will mostly focus on the "bigger" part of the running time function.

Now that we have isolated the slow part of your algorithm, we want to know how much time that actually takes. And here comes an important notion: **not all O(1) operations are the same**. A modulo operation, for example, can be four times slower than a sum.

```
int a;
for(int i = 1; i <= 1000000000; i++)
a+=i;
```

This takes 2.73s in my computer.

```
int a;
for(int i = 1; i <= 1000000000; i++)
a%=i;
```

This takes 8.14s.

Having different underlying constants is why, for example, using a static array can be faster than using a vector. And for more subtle reasons, certain ways to transverse matrices are faster than others. Not all constant operations are the same and you've got to be able to estimate constants for those O(1) operations, and that either means knowing how C++ works or just try a bunch of stuff yourself and get a feel for it.

Now for the algorithm itself, you very often can't just ignore the constants within them. Let's say you want to iterate through all possible combinations of 5 elements in an array. The *naive* solution is to make 5 nested for loops, giving us a total time of n^5*(whatever you are doing with those combinations).

```
int a;
for(int i1 = 0; i1 < 100; i1++){
for(int i2 = 0; i2 < 100; i2++){
for(int i3 = 0; i3 < 100; i3++){
for(int i4 = 0; i4 < 100; i4++){
for(int i5 = 0; i5 < 100; i5++){
a++;
}
}
}
}
}
```

This runs 19.96s in my computer.

Now, if the order of those elements does not matter, we can always say that the second element will be bigger or equal than the first one, the third bigger or equal than the second one and so on. Just by adjusting the for loops to do that, now our running time is around 100 times faster! That's a constant we can't just ignore.

```
int a;
for(int i1 = 0; i1 < 100; i1++){
for(int i2 = i1; i2 < 100; i2++){
for(int i3 = i2; i3 < 100; i3++){
for(int i4 = i3; i4 < 100; i4++){
for(int i5 = i4; i5 < 100; i5++){
a++;
}
}
}
}
}
```

The time is now 0.18s.

That's a trick that appears very often in dynamic programming or matrix exponentiation. Just by cutting your dimensions in half, you can get a code than runs in less a tenth the time it did before.

On the other hand, sometimes in lazy segment trees I have huge O(1) unlazy functions, that can take 20 operations to be done with. We can't just ignore that either.

Having that, we are ready to estimate the running time of our code.

We take our algorithm and isolate the part or parts that could give us trouble. We assign constants both to the C++ operations we use and how many of those operations we actually execute. Now we can just plug in the values and have the expected number of operations our code does.

In C++ we can do approximately 4*10^8 operations per second. If your code does not have some huge constant and the number you get when you plug in the values to the complexity is something less than 10^7 times the seconds you have, I would code it without giving it a second thought. If it is a relatively "clean" code, up until 10^8 is fine. More than that and you should probably start doing things more carefully.

I know it is all much more complex than this, but I hope I can help people that are starting to know what is worth coding, please comment any improvements I can make in this. :)

Actually it's more or less

`(what we had before)/120`

. Each order of the elements`a<b<c<d<e`

,`a<b<c<e<d`

,`a<b<d<c<e`

, ...,`e<d<c<b<a`

is equally likely, and there are 5!=120 of them.It's less or equal, so it would be a little bit less than that. Not sure how to calculate the exact value though. Thanks for the correction!

You can find the exact value by using the same argument as "number of ways to sum S with K numbers". Let's say we want a[0] <= a[1] <= a[2] <= ... <= a[N-1] with K possible values. We can consider this as a permutation of K stars using N bars so the exact value is (N + K choose N).

For example, N = 3 and K = 10 (3, 4, 9) would be * * * | * | * * * * * | *

Auto comment: topic has been updated by kobus (previous revision, new revision, compare).Auto comment: topic has been updated by kobus (previous revision, new revision, compare).Harshly disliked.

Sure, cool story bruh. Int overflow + use of uninitialized variable (double UB) + blatantly optimizable code runs

x secondswhat a miracle! (Clang optimizes this to empty function due to double UB, initializingayields this: https://godbolt.org/z/F3NQAb)Not even able to produce full source code to allow people to test same behaviour on their machienes which vary quite a bit and the difference between

How exactly did you come up to this conclusion? How are

vectorsand modulo operations even close to each other? Do you even have any idea howvectorworks?Yeah. Sure... That's exactly why floyd-warshalls is the most used thing to test jury's machine for speed by contestants on onsite starting with a value of 1000 and binary searching upwards.

Not even a single mention of the very existence of optimizations in

C+ + which are more than relevant when discussing that kind of premature optimizations.The fact that it overflows and that the variable is not initialized does not affect the running time of the operation. I am not looking to actually know the value of a, so there is no problem. I know the code is optimizable, but I didn't use any of the flags that would do it on purpose: the point is just seeing the running time, not passing any problem.

I say explicitly that that was the running time on my machine. That is the main function and there is basically nothing else in the code other than some defines. Putting the whole thing in would just distract from the main code.

The "that's why" was referring to the fact that that's because not all constants are the same, not the modulo thing. I understand why that would be confusing, but I would expect a knowledgeable person as yourself to be able to get the context.

I don't see why your last point is relevant, do you have any actual proof that that's not how it works? I would gladly see it and update the post if that's the case.

What exactly am I supposed to prove? That

`-O2`

speeds up your program? :DDDDDDDDDBenchmarking C++ code without optimization is as close to real benchmark as benchmarking other language.

Harshly disliked.

This is a good starting point in this kind of things, not a post about benchmarks. Yeah, there are few strong statements and maybe even mistakes, but many people just don't think about estimating constant factor along with complexity.

By experience. Yeah, I know, if you use vector correctly, this is mostly not true. But to know how to use vector correctly you have to think beforehand that if you will use vector incorrectly then it will be slower. And for first time readers it may be useful that replacing vectors by arrays may dramatically speedup their program.

Some. But why would I? Why can't I write some code with vectors, then write "the same" code with arrays and see that the latter is faster? See above about "correct usage of vectors".

You seem to be completely missing my point. I

Wasn't talking to you, was I?

I

completely disagreethat you can state"static array is faster than using a vector"(don't even mention that it has already been changed, that the author meant to say different thing)thus leading to most beginners reading this article (as you love to say, know this from my experience) into writing all most of their code using static arrays.Not only does this make every unexperienced contestant's code unreadable, it could potentially lead to significant decreases in perfomance (the case I know right from the top of my head is something of sort

`int dp[2][n]; swap(dp[0], dp[1]);`

Nor do I agree that you can claim things being true without providing any reasoning whatsoever. Some sort of yours

Constant optimizations in

C+ + (since the blog's authorexplicitlymentionedC+ + several times) should always be benchmarked as best as you possibly can.I really don't want to see blogs that teach people doing premature optimizations, which drastically impacts code-readability.

50026873

There is a "Compare" button, you can see exactly what changes I did to get AC in almost half a time limit from TL.

So, yes, I do believe that changing vectors to arrays (even non static) can speed up the solution. From my experience.

That's your point of view, not widely accepted. I would say that vector<vector<vector>> is unreadable.

Static array is faster than vector, by 0.01% or what?

It depends on how you use it. That may not be the best example. Do you have any better suggestions?

If you use O(N) arrays that have O(N * log) memory in total the difference is actually quite big.

Actually, not because of that, but because of dynamic memory allocations. And if you just process the data in the vector without reallocations, it will be as fast as with static array (if we consider using of

`-O2`

flag at least).Not always =) Random access to 256MB array is more than 200 times slower than random access to 4KB array.

I'd say we can do about 4*10^9 operations per second.

My "that's why" was confusing, I wasn't referring to that. I will update it.

Your random access point is valid though. I was trying to think of simple examples I could use. Any suggestions?

I guess the last one depends on how you define an operation. I haven't found anywhere that gives me a good answer and I guess I would consider anything in that range ok.

Sieve of Eratosthenes vs Segmented Sieve.

Or something like

`for (i = 0 .. n) cnt[a[i]]++;`

Auto comment: topic has been updated by kobus (previous revision, new revision, compare).Is cutting the loop in half really that common in matrix exponentiation? It seems like it would only work in the case where the matrix is symmetric.

It is common to be possible to cut the number of states in half of something like that, even if the matrix is not symmetric. Of course, the problem has to enable you to do so.

I did not regret that I read) Very cool! Especially with 5 nested loops. I did not know that it is accelerating so much .. And by the way, where is the modulus taken from you and i is simply added, it is written that it is 4 times faster, but then there certainly is no more than 3 to be exact =)

Sorry for the English, it's just the job of a translator))

I said it can be, in the same code with long long instead of int for a adding is 4.54 times faster than modulo in my computer. I said four because that's normally what I use and it's a good average that I can use without having to think about it very much. I am glad you liked the post!