Hello CodeForces)

I was solving one problem, and a lot of time I couldn't understand why i got WA

```
WA code ->
if (n > 1){
if (Check(n)) Update(n, 1);
else if ((ll)sqrt(n) * (ll)sqrt(n) == n) Update((ll)sqrt(n), 2);
}
```

```
AC code ->
double x = n;
x = sqrt(x);
if (n > 1){
if (Check(n)) Update(n, 1);
else if ((ll)x * (ll)x == n) Update((ll)x, 2);
}
```

**In first code if n = 302500001100000001 sqrt(n) = 550000000 but right is 550000001**

**I tryed to check in CodeForces custom test, it's error only in my and OJ compiler**

Tell me why???

Give me some hint, how to upload photo in HQ???

tl;dr: precision, excess precision, and optimizerRule of thumb: do not assume that the result of floating-point operation is exact or repeatable, it's not always true.

See that example. Ignore all these

`volatile`

and other hacks before`main`

, they're required to stop optimizer's mess for demonstration purposes (see below).I believe the following happens:`double`

is typically a 64-bit double-precision IEEE 754 number, which has 1 bit for sign, 11 bits for exponent and 52 bits for mantissa.`302500001100000001`

is`100 0011 0010 1011 0010 0010 0101 1111 0110 0001 1110 1110 1011 0000 0001`

in binary, which is 59 bits length. Leading one is not stored for normalized numbers in IEEE 754, so only leading 53 bits can be stored, which results in trailing 6 bits (`00 0001`

) to be rounded to`00 0000`

because of not enough precision. That is, your number becomes`302500001100000000`

when converted to`double`

, see the first line:`x: 302500001100000000.000000000000`

However, GCC's

`long double`

is typically a full 80-bit x86 extended precision floating point number. It has 64 bits for mantissa which is enough to store your number, see the second line:`y: 302500001100000001.000000000000`

.`sqrt`

's signature. It takes`double`

as the argument and returns another`double`

. So, its input is always slightly less than you expected. However, it still prints "correct" answer:`sqrt(x): 550000001.000000000000`

,`sqrt(y): 550000001.000000000000`

. It's also correct when converted to`long long`

:`to_ll(sqrt(x)): 550000001`

,`to_ll(sqrt(y)): 550000001`

. Why is that?`double`

. We're lucky here and its get rounded up, to the "correct" answer.`long double`

as an argument and returns`long double`

as a result. We can see that roots are slightly different:`sqrtl(x): 550000000.999999999069`

,`sqrtl(y): 550000001.000000000000`

. I have not checked further, but I believe that the absolute difference is less than`double`

's precision here.`long long`

(which is basically "rounding to zero"), results become different:`to_ll(sqrtl(x)): 550000000`

and`to_ll(sqrtl(y)): 550000001`

.Frankly speaking, some sources claim that C++ should have an overload of

`sqrt`

for`long double`

which returns`long double`

. I believe it depends on the header used (`<math.h>`

vs`<cmath>`

) and compiler's version. In my compiler, there is no overload for`long double`

regardless of whether I use`<math.h>`

or`<cmath>`

.Ok, so now we see that if

`double`

and`sqrt`

are used, result is "correct" because rounding errors compensate for each other, in "incorrect" if`long double`

and`sqrtl`

are used. However, your program does not seem to use any of the latter explicitly, so why does that happen anyway? The answer is, I believe, very typical:Believe it or not, operating with 80-bit floating point numbers is cheaper than with 64-bit, because the former are implemented in hardware. The latter are not. So, excess precision never hurt nobody, right? Right. Now let's perform all computations right in FPU registers without transferring values to/from memory where our short 64-bit double reside. That results in lack of rounding and more precise calculations. Actually, we have no control over which intermediate values are rounded (because there are not enough FPU registers and the value has to be stored in memory) and which are not.

And, finally, icing on the cake: there is an x86 assembly instruction

`fsqrt`

which takes a square root and optimizer knows that it's what`sqrt`

is supposed to do. So, optimizer happily replaces call to`sqrt`

(which would involve types conversion and stack frame allocation) with single`fsqrt`

instruction. Voila — now we have our`double`

"replaced" with`long double`

and`sqrt`

"replaced" with`sqrtl`

and we can start blaming excess precision.Why results differ from platform to platform?I believe it's because the "bug" (which is actually not a bug) requires a lot of things to happen simultaneously. If either optimizations are fully turned off, or the optimizer does not have the insight about

`sqrt`

of function, or its heuristics which decide in favor of doing everything inside FPU registers do not work well, bug is not there.What do we do?My personal recipe:

intermediate valuesand outputs can be safely stored in the floating point type you're using. Otherwise, you gonna have a bad time.`eps`

to the number before converting or use`round`

function. Note that`eps`

should be big enough to make some change: if you add 10^{ - 15}to 10000.25, it's not going to change anything, although 10^{ - 15}can be stored in`double`

. Also,`eps`

should be small enough to ensure that you don't "overround" your number.I'm sure no one read that till the end. Just upvote and think this guy knows well :P

imgur or something