Take a look at this C++ submission 199864568:

```
#import <bits/stdc++.h>
using namespace std;
int main()
{
int a;
cin >> a;
cout << ((a%2==0 && a>2) ? "YES" : "NO");
}
```

**Don't see it?**

**Spoilers**

# | User | Rating |
---|---|---|

1 | Benq | 3783 |

2 | jiangly | 3666 |

3 | tourist | 3611 |

4 | Um_nik | 3536 |

5 | inaFSTream | 3477 |

6 | fantasy | 3468 |

7 | maroonrk | 3464 |

8 | QAQAutoMaton | 3428 |

9 | ecnerwala | 3427 |

10 | Ormlis | 3396 |

# | User | Contrib. |
---|---|---|

1 | Um_nik | 185 |

2 | adamant | 178 |

3 | awoo | 177 |

4 | nor | 169 |

5 | maroonrk | 165 |

6 | -is-this-fft- | 164 |

7 | antontrygubO_o | 153 |

8 | ko_osaga | 151 |

9 | dario2994 | 150 |

10 | SecondThread | 149 |

Take a look at this C++ submission 199864568:

```
#import <bits/stdc++.h>
using namespace std;
int main()
{
int a;
cin >> a;
cout << ((a%2==0 && a>2) ? "YES" : "NO");
}
```

Take a closer look at the code =P

Look even closer.

Read the first line.

The first line uses `#import`

instead of the usual `#include`

.

The other day, I was coding some C++. My Pythonic muscle memory caused me to accedently write `#import`

instead of `#include`

, and to my great surprise, the code still compiled!

The only information that I was able to find about `#import`

is from this stackoverflow comment. As I understand it, `#import`

is a really old gcc feature that combines `#include`

and `#pragma once`

into one.

There are arguably some other positives to using `#import`

instead of `#include`

. `#import`

is one character shorter than `#include`

, and as someone that mainly uses Python, it is easier for me to remember import than include.

I recently had a very interesting idea for how to greatly speed up convolution (a.k.a. polynomial multiplication).

```
def convolution(A, B):
C = [0] * (len(A) + len(B) - 1)
for i in range(len(A)):
for j in range(len(B)):
C[i + j] += A[i] * B[j]
return C
```

The standard technique for how to do convolution fast is to make use of cyclic convolution (polynomial mult mod $$$x^n - 1$$$).

```
def cyclic_convolution(A, B):
n = len(A) # A and B needs to have the same size
C = [0] * n
for i in range(n):
for j in range(n):
C[(i + j) % n] += A[i] * B[j]
return C
```

Cyclic convolution can be calculated in $$$O(n \log n)$$$ using FFT, which is really fast. The issue here is that in order to do convolution using cyclic convolution, we need to pad with a lot of 0s to not be affected by the wrap around. All this 0-padding feels very inefficient.

So here is my idea. What if we do polynomial multiplication mod $$$x^n - i$$$ instead of mod $$$x^n - 1$$$? Then when we get wrap around, it will be multiplied by the imaginary unit, so it wont interfere with the real part! I call this the *imaginary cyclic convolution*.

```
def imaginary_cyclic_convolution(A, B):
n = len(A) # A and B needs to have the same size
C = [0] * n
for i in range(n):
for j in range(n):
C[(i + j) % n] += (1 if i + j < n else 1j) * A[i] * B[j]
return C
```

Imaginary cyclic convolution is the perfect algorithm to use for implementing convolution. Using it, we no longer need to do copious amount of 0 padding, since the imaginary unit will take care of the wrap around. In fact, the size (the value of $$$n$$$) required is exactly half of what we would need if we had used cyclic convolution.

One question still remains, how do we implement imaginary cyclic convolution efficiently?

The trick is rather simple. Let $$$\omega = i^{\frac{1}{n}}$$$. Now note that if $$$C(\omega x) = A(\omega x) B(\omega x) \mod x^n - 1$$$ then $$$C(x) = A(x) B(x) \mod x^n - i$$$. So here is the algorithm

```
def imaginary_cyclic_convolution(A, B):
n = len(A) # A and B needs to have the same size
w = (1j)**(1/n) # n-th root of imaginary unit
# Transform the polynomials A(x) -> A(wx) and B(x) -> B(wx)
A = [A[i] * w**i for i in range(n)]
B = [B[i] * w**i for i in range(n)]
C = cyclic_convolution(A, B)
# Transform the polynomial C(wx) -> C(x)
C = [C[i] / w**i for i in range(n)]
return C
```

Hi CF! During this past weekend I was reading up on Montgomery transformation, which is a really interesting and useful technique to do fast modular multiplication. However, all of the explanations I could find online felt very unintuitive for me, so I decided to write my own blog on the subject. A big thanks to kostia244, nor, nskybytskyi and -is-this-fft- for reading this blog and giving me some feedback =).

Let $$$P=10^9+7$$$ and let $$$a$$$ and $$$b$$$ be two numbers in $$$[0,P)$$$. Our goal is to calculate $$$a \cdot b \, \% \, P$$$ without ever actually calling $$$\% \, P$$$. This is because calling $$$\% \, P$$$ is very costly.

*If you haven't noticed that calling $$$\% \, P$$$ is really slow, then the reason you haven't noticed it is likely because the compiler automatically optimizes away the $$$\% \, P$$$ call if $$$P$$$ is known at compile time. But if $$$P$$$ is not known at compile time, then the compiler will have to call $$$\% \, P$$$, which is really really slow.*

It turns out that the trick to calculate $$$a \cdot b \, \% \, P$$$ efficently is to calculate $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ efficiently. So the goal for this section will be to figure out how to calculate $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ efficently. $$$a \cdot b \cdot 2^{-32} \, \% \, P$$$ is called the Montgomery reduction of $$$a \cdot b$$$, denoted by $$$\text{m_reduce}(a \cdot b)$$$.

Suppose that $$$a \cdot b$$$ just happens to be divisible by $$$2^{32}$$$. Then $$$(a \cdot b \cdot 2^{-32}) \, \% \, P = (a \cdot b) \gg 32$$$, which runs super fast!

Can we do something similar if $$$a \cdot b$$$ is not divisible by $$$2^{32}$$$? The answer is yes! The trick is to find some integer $$$m$$$ such that $$$(a \cdot b + m \cdot P)$$$ is divisible by $$$2^{32}$$$. Then $$$a \cdot b \cdot 2^{-32} \, \% \, P = (a \cdot b + m \cdot P) \cdot 2^{-32} \, \% \, P = (a \cdot b + m \cdot P) \gg 32$$$.

So how do we find such an integer $$$m$$$? We want $$$ (a \cdot b + m \cdot P) \,\%\, 2^{32} = 0$$$ so $$$m = (-a \cdot b \cdot P^{-1}) \,\%\, 2^{32}$$$. So if we precalculate $$$(-P^{-1}) \,\%\, 2^{32}$$$ then calculating $$$m$$$ can be done blazingly fast.

Since the Montgomery reduction divides $$$a \cdot b$$$ by $$$2^{32}$$$, we would like some some way of multiplying by $$$2^{32}$$$ modulo $$$P$$$. The operation $$$x \cdot 2^{32} \, \% \, P$$$ is called the Montgomery transform of $$$x$$$, denoted by $$$\text{m_transform}(x)$$$.

The trick to implement $$$\text{m_transform}$$$ efficently is to make use of the Montgomery reduction. Note that $$$\text{m_transform}(x) = \text{m_reduce}(x \cdot (2^{64} \, \% \, P))$$$, so if we precalculate $$$2^{64} \, \% \, P$$$, then $$$\text{m_transform}$$$ also runs blazingly fast.

Using $$$\text{m_reduce}$$$ and $$$\text{m_transform}$$$ there are multiple different ways of calculating $$$a \cdot b \, \% \, P$$$ effectively. One way is to run $$$\text{m_transform}(\text{m_reduce}(a \cdot b))$$$. This results in two calls to $$$\text{m_reduce}$$$ per multiplication.

Another common way to do it is to always keep all integers transformed in the so called Montgomery space. If $$$a' = \text{m_transform}(a)$$$ and $$$b' = \text{m_transform}(b)$$$ then $$$\text{m_transform}(a \cdot b \, \% \, P) = \text{m_reduce}(a' \cdot b')$$$. This effectively results in one call to $$$\text{m_reduce}$$$ per multiplication, however you now have to pay to move integers in to and out of the Montgomery space.

Here is a Python 3.8 implementation of Montgomery multiplication. This implementation is just meant to serve as a basic example. Implement it in C++ if you want it to run fast.

```
P = 10**9 + 7
r = 2**32
r2 = r * r % P
Pinv = pow(-P, -1, r) # (-P^-1) % r
def m_reduce(ab):
m = ab * Pinv % r
return (ab + m * P) // r
def m_transform(a):
return m_reduce(a * r2)
# Example of how to use it
a = 123456789
b = 35
a_prim = m_transform(a) # mult a by 2^32
b_prim = m_transform(b) # mult b by 2^32
prod_prim = m_reduce(a_prim * b_prim) # divide a' * b' by 2^32
prod = m_reduce(prod_prim) # divide prod' by 2^32
print('%d * %d %% %d = %d' % (a, b, P, prod)) # prints 123456789 * 35 % 1000000007 = 320987587
```

One important issue that I've so far swept under the rug is that the output of `m_reduce`

is actually in $$$[0, 2 P)$$$ and not $$$[0, P)$$$. I just want end by discussing this issue. I can see two ways of handling this:

- Alternative 1. You can force $$$\text{m_reduce}(a \cdot b)$$$ to be in $$$[0, P)$$$ for $$$a$$$ and $$$b$$$ in $$$[0, P)$$$ by adding an if-stament to the output of
`m_reduce`

. This will work for any odd integer $$$P < 2^{31}$$$.

```
def m_reduce(ab):
m = ab * Pinv % r
y = (ab + m * P) // r
return y if y < P else y - P
```

- Alternative 2. Assuming $$$P$$$ is an odd integer $$$< 2^{30}$$$ then if $$$a$$$ and $$$b$$$ $$$\in [0, 2 P)$$$ you can show that the output of $$$\text{m_reduce}(a \cdot b)$$$ is also in $$$[0,2 P)$$$. So if you are fine working with $$$[0, 2 P) \vphantom]$$$ everywhere then you don't need any if-statements. Nyaan's github has a nice C++ implementation of Montgomery multiplication using this style of implementation.

I've always liked using Python (PyPy) for solving problems in competitive programming. And most problems are very doable, even in Python. What I've found is that the most difficult problems to solve in Python are those requiring 64 bit integers.

The reason why 64 bit integers are problematic is because CF runs Windows, and PyPy only supports 32 bit on Windows. So whenever a problem involves integers that cannot fit inside of a signed 32 bit int, PyPy switches to big integers (which runs insanely slow, sometimes a factor of 20 times slower).

What I usually try is switching from using integers to floats. This works if all integers are $$$\leq 2^{52}$$$:

53441394 (Fat TLE and almost MLE) vs 53456822 (AC by a mile)

`a * b % MOD`

Many problems on CF involve doing multiplication modulo $$$10^9 + 7$$$. This is super slow because of big integers. After a long time of trying around with different methods, I've found that this is the best solution. Today I always use it whenever I do multiplication mod $$$10^9 + 7$$$. But sadly, while it is much faster than using big integers, it is still slow compared to doing `a * b % MOD`

in 64 bit PyPy.

Another trick is to use a deprecated feature that only exists in PyPy2:

114602234 (Fat TLE) vs 114611359 (AC by a mile)

A big warning here. Not only is this feature deprecated, but from my own personal experience using it means opening a can of worms. I will only use this as an absolute last resort.

However with the latest PyPy version (version 7.3.4) PyPy has finally switched to 64 bit on Windows! So upgrading PyPy would mean no more problems with big integers. This would make PyPy far more usable and more beginner friendly. So if possible please update PyPy's version on CF to 7.3.4! MikeMirzayanov

Edit: Reading Results of 2020 [list some changes and improvements] blog I realized that I should probably be tagging geranazavr555, kuviman and cannor147 too.

Let me tell you the story of how I made $2200 from doing competitive programming.

Once many many fortnights ago Hackerrank held one of its regular competitions, "World CodeSprint 9". This was back when Hackerrank actually sent out its prizes. The competition was very unusual in that one of its hardest problems was a scored based approximation problem. This competition was also the first time that I would get placed in the top 100s! Using my beloved Python :)

As I recall the prize for getting placed 4 to 100 was a t-shirt and $75. More precisely these $75 were sent either in Bitcoins or Amazon giftcards depending on where the prize winners lived, and in my case I got Bitcoins. I received the $75 in Bitcoins on 21st of March 2017.

When I got them, I didn't really know how to do anything with them, so I kind of just forgot about them. Turns out that the value of Bitcoin has increased a bit since then:

30 times more to be precise! So today I just sold them for a bit over $2200 (Sold when the price hit 26640€/btc). Not too shabby for a 34th place finish in a regular competition! =D

I'm writing this blog because of the large number of blogs asking about why they get strange floating arithmetic behaviour in C++. For example:

"WA using GNU C++17 (64) and AC using GNU C++17" https://codeforces.com/blog/entry/78094

"The curious case of the pow function" https://codeforces.com/blog/entry/21844

"Why does this happen?" https://codeforces.com/blog/entry/51884

"Why can this code work strangely?" https://codeforces.com/blog/entry/18005

and many many more.

Here is a simple example of the kind of weird behaviour I'm talking about

```
#include <iostream>
using namespace std;
double f(double a, double b) {
return a * a - b;
}
int main() {
cout.precision(60);
// Calculate 10*10 - 1e-15
double ans;
ans = f(atof("10"), atof("1e-15"));
cout << (double) ans << '\n';
cout << (int) ans << "\n\n";
ans = f(atof("10"), atof("1e-15"));
cout << (int) ans << '\n';
cout << (double) ans << "\n\n";
ans = f(atof("10"), atof("1e-15"));
cout << (double) ans << '\n';
cout << (long double) ans << "\n\n";
ans = f(atof("10"), atof("1e-15"));
cout << (long double) ans << '\n';
cout << (double) ans << "\n\n";
return 0;
}
```

```
100
100
99
100
100
100
99.99999999999999900079927783735911361873149871826171875
100
```

```
100
100
100
100
100
100
100
100
```

Looking at this example, the output that one would expect from $$$10 * 10 - 10^{-15}$$$ is exactly $$$100$$$ since $$$100$$$ is the closest representable value of a double. This is exactly what happens in 64 bit g++. However, in 32 bit g++ there seems to be some kind of hidden *excess precision* causing the output to only sometimes(???) be $$$100$$$.

In C and C++ there are different modes (referred to as methods) of how floating point arithmetic is done, see (https://en.wikipedia.org/wiki/C99#IEEE_754_floating-point_support). You can detect which one is being used by the value of `FLT_EVAL_METHOD`

found in `cfloat`

. In mode 2 (which is what 32 bit g++ uses by default) **all** floating point arithmetic is done using long double. Note that in this mode numbers are temporarily stored as long doubles while being operated on, this can / will cause a kind of excess precision. In mode 0 (which is what 64 bit g++ uses by default) the arithmetic is done using each corresponding type, so there is no excess precision.

Here is a simple example of how to detect excess precision (partly taken from https://stackoverflow.com/a/20870774)

```
// #pragma GCC target("fpmath=sse,sse2") // Turns off excess precision
// #pragma GCC target("fpmath=387") // Turns on excess precision
#include <iostream>
#include <cstdlib>
#include <cfloat>
using namespace std;
int main() {
cout << "This is compiled in mode "<< FLT_EVAL_METHOD << '\n';
cout << "0 means no excess precision.\n";
cout << "2 means there is excess precision.\n\n";
cout << "The following test detects excess precision\n";
cout << "0 if no excess precision, or 8e-17 if there is excess precision.\n";
double a = atof("1.2345678");
double b = a*a;
cout << b - 1.52415765279683990130 << '\n';
return 0;
}
```

If b is rounded (as one would "expect" since it is a double), then the result is zero. Otherwise it is something like 8e-17 because of excess precision. I tried running this in custom invocation. MSVC(C++17), Clang and g++17(64bit) all use mode 0 and round b to 0, while g++11, g++14 and g++17 as expected all use mode 2 and b = 8e-17.

The culprit behind all of this misery is the old x87 instruction set, which only supports (80 bit) long double arithmetic. The modern solution is to on top of this use the SSE instruction set (version 2 or later), which supports both float and double arithmetic. On GCC you can turn this on with the flags `-mfpmath=sse -msse2`

. This will not change the value of `FLT_EVAL_METHOD`

, but it will effectively turn off excess precision, see 81993714.

It is also possible to effectively turn on excess precision with `-mfpmath=387`

, see 81993724.

Using your newfound knowledge of excess precision, try to find a compiler + input to "hack" this

```
#include <iostream>
#include <cmath>
using namespace std;
bool f() {
double x;
cin >> x;
double y = x + 1.0;
if (y >= 1.0)
return false;
int w = pow(y, 2);
if (y < 1.0)
return false;
return y == 1.0;
}
int main() {
if (f())
cout << "HACKED\n";
else
cout << "Not hacked\n";
}
```

32 bit g++ by default does all of its floating point arithmetic with (80 bit) long double. This causes a ton of frustrating and weird behaviours. 64 bit g++ does not have this issue.

Codeforces (c) Copyright 2010-2023 Mike Mirzayanov

The only programming contests Web 2.0 platform

Server time: Jun/02/2023 08:11:44 (g1).

Desktop version, switch to mobile version.

Supported by

User lists

Name |
---|