### ExplodingFreeze's blog

By ExplodingFreeze, history, 3 years ago, Tl;dr — Assuming negligible cache misses, how many simple operations (reading/setting array values, comparing integers, addition, subtraction, multiplication) can you perform in a second on Codeforces?

So recently I was trying 1303E - Удаляем подпоследовательности and I realized that my code which appeared to be $N^{4}$ ran extremely quickly in custom test for inputs with $N = 400$, so I ended up submitting it and got accepted in just 650ms — 79311246.

While at first glance it appears as if it the loop would run $N^{4}$ times ($2.5 * 10^{10}$ times at worst), you can show it will run much faster than that, actually taking $\sum_{i = 1}^{\left | t \right |} (i + 1)(\left | t \right | - i + 1)(\left | s \right | - \left | t \right | + 1)$ operations. Simplyifying this we get $\frac{1}{6}(\left | s \right | - \left | t \right | + 1)(\left | t \right |^{3} + 6 * \left | t \right |^{2})$.

Maximizing $\left | s \right |$ and throwing this equation into Wolfram Alpha we get a maxima of a bit under $5 * 10^{8}$. A test which approaches this maxima is simply $\left | s \right | = 400$ and $\left | t \right | = 300$ with all characters of both the strings being the same.

However considering the if statements with character comparisons and dp array value assignments in each loop iteration, as well as the zeroing of the array, the number of simple operations performed easily approaches $4 * 10^{9}$.

So my question is — Assuming a fair amount of cache persistence, has anyone tested how many simple operations can run in a second on Codeforces?

The reason I mention cache persistance is it clearly plays a massive role in solutions like this, swapping the 2nd and 4th loops results in the runtime increasing from $750ms$ to nearly $3000ms$ for the max test.

Theoretically the limit on the number of operations would be $\frac{\text{clock speed } * \text{ instructions per clock per core}}{\text{average clock cycles per operation}}$, which assuming Codeforces is running a fairly modern processor capable of around 8 instructions per clock per core, clocked at 4Ghz (unlikely if its a high core count processor), would yield somewhere around $8 * 10 ^ {9}$ instructions per second assuming an average operation requires 4 clock cycles. However in practice, I guess this limit would be a fair bit lower.  Comments (3)
 » Auto comment: topic has been updated by ExplodingFreeze (previous revision, new revision, compare).
 » Assuming negligible cache misses, depends on data types and pipelining.
 » 3 years ago, # | ← Rev. 3 →   There are a lot more factors than just cache misses. For example, there's branch prediction and compiler optimizations. It also depends on your definition of basic operation. Lastly #define int long long can have a huge impact on speed so be careful with it. For example, this takes 1450 ms: int main() { for (volatile int i = 0; i < 1000000000; i++); } While this only takes 920 ms: int main() { volatile int x; for (int i = 0; i < 100000000; i++) { for (x = 0; x < 10; x++); } } With all those factors it really doesn't get more precise than "around $10^9$". I find that checking if the complexity is above $10^9$ and a rough estimate of the constant factor is pretty much always good enough.