Блог пользователя -is-this-fft-

Автор -is-this-fft-, история, 18 месяцев назад, По-английски

When solving problems in competitive programming, it is almost never a good idea to use the following inbuilt C++ functions. You will be hacked or fail a pretest or worse, a systest.

Why? Because they use floating-point numbers. They are designed to be used with a floating-point input and a floating-point output. The issue is that on a floating-point number, the result may not be exact. Worse, floating-point numbers may not be able to accurately encode the integer.

To calculate $$$\lfloor \frac{a}{b} \rfloor$$$ for positive integers $$$a$$$ and $$$b$$$:
  • Don't use floor((double) a / (double) b) or similar.
  • Do use a / b. It will round down.
  • Warning: be careful with negative numbers. The answer depends on whether we should round down or towards 0.
To calculate $$$\lceil \frac{a}{b} \rceil$$$ for positive integers $$$a$$$ and $$$b$$$:
  • Don't use ceil((double) a / (double) b) or similar.
  • Do use (a + b - 1) / b.
  • Warning: the same caveat goes for negative numbers as before.
To calculate $$$\lfloor \sqrt{a} \rfloor$$$ for a nonnegative integer $$$a$$$:
  • Don't use sqrt(a) or similar.
  • Do use binary search to calculate the square root.
Implementation
To calculate $$$a^b$$$ for nonnegative integers $$$a$$$ and $$$b$$$:
  • Don't use pow(a, b) or similar.
  • Do use the naive algorithm. If you really want to do this and $$$a^b$$$ fits in a long long, then it will take no more than 64 steps, unless $$$a = 0$$$ or $$$a = 1$$$, in which case you can just return $$$a$$$.
To calculate $$$\lfloor \log_2 (a) \rfloor$$$ for a positive integer $$$a$$$:
  • Don't use log2(a) or similar.
  • Do use __lg or an approach based on __builtin_clz or __builtin_clzll. The "clz" stands for "count leading zeroes" and it does exactly what it says — on the binary representation of the number. If you have access to C++20, there is also the bit_width library function.
  • Warning: __builtin_clz(0) and __builtin_clzll(0) are undefined behavior. Also all these functions starting with __ are specific to GCC and you're technically not meant to use them, but it's fine in competitive programming.

Exceptions

The only major exception is if floating-point numbers are an inherent part of your solution. Most commonly, this happens in problems where the output is itself a floating-point number (especially geometry problems, sometimes probability). There are also some algorithms where you need to work with floating-point numbers even if the input and output are integer (FFT, possibly some LP).

But even in these cases, it's always a good idea to do as much as possible with integers and move to floats as late as possible.

I got WA with one compiler, accepted with another

Most likely it has to do with 64-bit versus 32-bit. Also the widths of the various floating-point types vary. It's possible that in some circumstances your solution is actually correct. However, rely on this only if you know what you're doing.

  • Проголосовать: нравится
  • +500
  • Проголосовать: не нравится

»
18 месяцев назад, # |
  Проголосовать: нравится +142 Проголосовать: не нравится

sqrt is fine just check your answer once after doing it

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

The code given in spoiler is dirty take a look at it -is-this-fft-

»
18 месяцев назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

lol in hindsight me failing pretest 4 on B 3 times and switching from sqrt() to binary search helped me since I didn't fail sys test 11 xd

»
18 месяцев назад, # |
  Проголосовать: нравится +79 Проголосовать: не нравится

I'm glad that my contest helped raise competitors awareness to this matters. Many people are still using built-ins without caution or understanding the imprecision it could bring. Especially floating-point related!

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится -30 Проголосовать: не нравится

Very usefull, now

Spoiler
»
18 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Thanks for your benefit blog .can you make another blog about precision errors but in python since this blog about c++ ?
Thanks in advance .

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится +82 Проголосовать: не нравится

Very helpful blog for the newcomers, thank you! I will definitely link this to my students.

Some thoughts:

  • in the section about $$$a^b$$$, maybe add a special case for $$$-1$$$? I know that you've written about non-negative integers, but with this additional case, it works for negative $$$a$$$ as well.

  • maybe add a section about comparing fractions? Like, if you want to check that $$$\frac{a}{b} = \frac{c}{d}$$$, you shouldn't divide in floating-point numbers; instead either check $$$a*d == b*c$$$ (if it fits in the integer types you use) or normalize the fractions.

  • isn't it possible to estimate $$$sqrt(n)$$$ almost perfectly with the standard function and then search for the answer in something like $$$[sqrt(n)-5, sqrt(n)+5]$$$? I know it's kinda advanced, but so is binary search.

  • »
    »
    18 месяцев назад, # ^ |
    Rev. 2   Проголосовать: нравится +35 Проголосовать: не нравится

    $$$\pm 5$$$ should be a rough guess and might work fine, but you can still just increment/decrement linearly until $$$R^2 \leq n$$$ and $$$(R+1)^2 > n$$$ are both true. this is guaranteed to work, and should still work fast enough as the error should be not very big.

  • »
    »
    18 месяцев назад, # ^ |
      Проголосовать: нравится +2 Проголосовать: не нравится

    if u won't mind can u pls tell where do u teach?And how one can enroll in your courses?

    Thank you.

  • »
    »
    18 месяцев назад, # ^ |
      Проголосовать: нравится -7 Проголосовать: не нравится

    Would you please update the difficulty rating of the last edu round's problems?

»
18 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Reading this makes me feel very thankful that I AC'd Dytechlab's contest question B using normal std::sqrt()

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится +73 Проголосовать: не нравится

sqrt(x) is fine, just do something like this:

int64_t x;
cin >> x;
int64_t r = sqrtl(x) + 1;
while (r * r > x) r--;

sqrtl() use long double instead of double, and might not be necessary.

+1 should be enough, but if you are paranoid then +small would be better.

Generally this should be better than binary search.

If you for some reason what to avoid having any floating point at all, then try to implement the search as $$$r_{i+1} = (r_{i} + x/r_{i}) / 2$$$ instead, it might be faster than just binary search.

»
18 месяцев назад, # |
  Проголосовать: нравится +105 Проголосовать: не нравится

You can compute $$$\lfloor \sqrt a \rfloor$$$ faster and easier than with binary search:

long long x = sqrt(a) + 2;
while (x * x > a) x--;
  • »
    »
    18 месяцев назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    Is there formal proof that $$$R+2 \geq \sqrt{x}$$$ is true given $$$R$$$ as the result given by the standard function? I know that the error will be minimal, but I can't get an exact upper bound for the error.

    • »
      »
      »
      18 месяцев назад, # ^ |
        Проголосовать: нравится +7 Проголосовать: не нравится

      It depends on the compiler version, and nothing is guaranteed. Read this old blog by misof: The curious case of the pow function.

      Fortunately, for most operations (including sqrt), the magnitude of the relative precision error is equal to the type precision. It should be around $$$10^{-15}$$$ for double. The square root of long long usually doesn't exceed $$$10^9$$$ so the absolute error should be around $$$10^{-6}$$$. It means that sqrt(a) + 0.0001 is more than enough already.

      more info
      more issues
      • »
        »
        »
        »
        18 месяцев назад, # ^ |
          Проголосовать: нравится +31 Проголосовать: не нравится

        For sqrt there are some guarantees. https://en.cppreference.com/w/cpp/numeric/math/sqrt

        std::sqrt is required by the IEEE standard to be correctly rounded from the infinitely precise result. In particular, the exact result is produced if it can be represented in the floating-point type. The only other operations which require this are the arithmetic operators and the function std::fma. Other functions, including std::pow, are not so constrained.

      • »
        »
        »
        »
        18 месяцев назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        If this is the case, isn't sqrt(a) + 1 ok (using the same code snippet, just changing 2 to 1)? Or am I missing something?

      • »
        »
        »
        »
        18 месяцев назад, # ^ |
        Rev. 2   Проголосовать: нравится +46 Проголосовать: не нравится

        Just a warning about the blog The curious case of the pow function by misof. Not a single person back then realized what was going on. There are so many erroneous theories in that blog.

        The thing no one figured out is that using 32 bit g++ compiler, every float and every double, is actually a 80 bit long double. Only when a floating point variable is stored in memory will it actually be stored as its corresponding type. Otherwise, all floating point variables will act as long doubles.

        For example, run this program with the input 123456789.

        #include <bits/stdc++.h>
        using namespace std;
        int main () {
            double n;
            cin >> n;
            double x = n * n;
            cout << (long long)x;
        }
        

        You will get the output 15241578750190521 (which should be impossible since 15241578750190521 cant be represented as a double). The reason it is possible is that x is infact a long double.

        Now try running this program

        #include <bits/stdc++.h>
        using namespace std;
        int main () {
            double n;
            cin >> n;
            double x = n * n;
            pow(x, 10);
            cout << (long long)x;
        }
        

        This time around the result is 15241578750190520. This is because the pow(x, 10) call effectively forces x to be stored in memory, which rounds x from 15241578750190521 to 15241578750190520.

        • »
          »
          »
          »
          »
          18 месяцев назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Wow, interesting. So the precision can be better than expected and it might cause a misunderstanding of precision. Is there any source where I can read more about this? If I encountered such a difference during a contest, I would just assume that the compiler optimizes something in the first code.

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится -15 Проголосовать: не нравится

For the round-up, I use this.

ans=ceil(a*1.00/b);

I never got an error in this (till now), but is there a possibility that this way can fail?

Thanks!

Why am I being downvoted? (Sorry if I commented something wrong)

  • »
    »
    18 месяцев назад, # ^ |
      Проголосовать: нравится -12 Проголосовать: не нравится

    Yes, it can totally cause an off-by-one error, and even get worse when we are calculating sums of it. I experienced this issue on ARC134A — Bridge and Sheets.

  • »
    »
    14 месяцев назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Because people either have outdated knowledge or assume you’re using this on 64-bit integers. For 32-bit integers, ceil is fine and likely actually faster than the pure-integer alternative.

»
18 месяцев назад, # |
  Проголосовать: нравится -8 Проголосовать: не нравится

for $$$log_2$$$ this way may be more useful:
f[0]=0;for(int i=1;i<=N;i++) f[i]=f[i>>1]+1;

However,my friends told me "__builtin_clz()" is as fast as the integer operator "+".

  • »
    »
    18 месяцев назад, # ^ |
      Проголосовать: нравится -14 Проголосовать: не нравится

    That's a common method used in implementing Sparse Tables for RMQ. However, in my experience, __lg worked very well also.

  • »
    »
    18 месяцев назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    Memory access is hell of a lot slower than most (series of) arithmetic operations. Use literally anything else: __lg if your compiler supports it, std::bit_width if you're using C++20, even a loop like while(x > 0) will get optimized away.

»
18 месяцев назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

THanks for the blog!

»
18 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

-is-this-fft-, could you please fix (again?) the comment in the attached code? For me, it is shown as // works for at least up to (2**31 &mdash; 1)**2

Also, one might want to use binary exponentiation instead of pow. The fun fact is that in this case even $$$0$$$ or $$$1$$$ to some enormous power are calculated as fast as $$$2^{62}$$$ is suggested to be calculated. So it is not obligatory to if them, and if it's done, the code is even faster. Probably this part is "advanced" too...

»
18 месяцев назад, # |
  Проголосовать: нравится +53 Проголосовать: не нравится

You can also use the Babylonian method to find the square root. It converges quadratically and is very fast.

int isqrt(int x)
{
    // Integer square root of x.
    // 
    // Use the Babylonian method, adapted to integers. The next estimate will
    // be less than the current one, as long as the latter is greater than √x.
    // In the end, either the estimates will match, or they will trade places.
    // At this point, return the non-greater estimate.
    // 
    // We round the first estimate up to avoid it becoming zero.
    int a = x, b = (x + 1) / 2;
    while (a > b)
    {
        a = b;
        b = (b + x / b) / 2;
    }
    return a;
}

With a good initial guess the method works even faster.

»
18 месяцев назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

Several more points about make int, not float

Cross product

Cross product calculates $$$(-1)^n \cdot 2S$$$. The sign depends on rotation of points in triangle. It can be used in tasks related to convex hull. Don't calculate angles.

int c_p(int x1, int y1, int x2, int y2, int x3, int y3) {
    return (x2 - x1) * (y3 - y1) - (x3 - x1) * (y2 - y1);
}

Task

Square

If the points of triangle are integers, then the square of it is $$$\frac{int}{2}$$$. We can operate on double square, which are ints. We can get it using vector product, which returns $$$(-1)^n \cdot 2S$$$.

For example, we can safely check if point $$$(x4, y4)$$$ is inside of triangle (including boundaries).

if (abs(c_p(x1, y1, x2, y2, x3, y3) == abs(c_p(x4, y4, x1, y1, x2, y2) + abs(c_p(x4, y4, x1, y1, x3, y3) + abs(c_p(x4, y4, x2, y2, x3, y3)) ...

Task

Dot product

Dot product is needed more rarely. It can be used to check, if the angle is acute, right or obtuse. For example, it can be used to find distance from point to segment on plane. The are two cases: when perpendicular from point to segment's line lies on segment, or not. It can be checked with dot product.

int d_p(int x1, int y1, int x2, int y2, int x3, int y3) {
    return (x2 - x1) * (x3 - x1) + (y2 - y1) * (y3 - y1);
}

Task

Sort by polar angle

Assume, you need to sort point by polar angle relative to vector $$$(1, 0)$$$. You can do $$$a[i] = atan2(y[i], x[i]);$$$ (if you don't know, what is atan2(y, x), discover it, it helpful in reducing ifs). Instead we can (unfortunately, with ifs) sort points in one quadrant by their tan (which is monotone there). But compare not $$$\frac{y[i]}{x[i]} - \frac{y[j]}{x[j]}$$$, but $$$y[i] * x[j] - y[j] * x[i]$$$.

Task Task

Check, if overflow happened

Assume, in problem you have to multiply numbers $$$\le 10^{18}$$$, but you know for sure, that if at some moment the number is $$$\ge 10^{18}$$$, the answer is "NO". How can we check, if multiplication won't give overflow?

Spoiler

Assume, you have to check, if $$$x \cdot y \le C$$$. Instead, check if $$$x \le \frac{C}{y}$$$

»
18 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Floating-point numbers can accurately represents $$$2^n$$$, so maybe long double is still useful for some problems that require $$$2^n$$$ of your output. It can represent $$$2^n$$$ for any integer $$$n<15000$$$ (in the circumstances where long double have 15 bits for Exponent).

»
18 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Some people are suggesting sqrtl. Can someone write or refer to a convincing analysis to show that sqrtl is always correct?

  • »
    »
    18 месяцев назад, # ^ |
    Rev. 2   Проголосовать: нравится +5 Проголосовать: не нравится

    On GCC, long double is 80 bits, and its mantissa is 63 bits, which is enough to store all long long integers, so the argument of sqrtl can be represented precisely. IEEE 754 requires that the returned value is represented as precisely as possible, which translates to exact integers when the root is whole and something slightly bigger than (or equal to? not sure if this is possible but it doesn't harm the analysis so why not) the floor of the exact root, which translates to $$$\lfloor\sqrt{n}\rfloor$$$ after truncating. Sounds right?

    • »
      »
      »
      18 месяцев назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Is it possible that the square root is not representable and gets rounded up to the next representable float, which happens to be an integer?

      • »
        »
        »
        »
        18 месяцев назад, # ^ |
        Rev. 12   Проголосовать: нравится +42 Проголосовать: не нравится

        IEEE 754 guarantees that the absolute error of a square root is within 0.5 ulp. Note that GCC documentation does not seem to confirm that as a requirement, so that may fail on non-x86 machines, but sqrtl uses the fsqrt on x86, which seems to adhere to the typical rounding requirements. As the square root of a number less than $$$2^{62}$$$ is less than $$$2^{31}$$$, the maximal possible exponent is 30, and hence 0.5 ulp is at most $$$0.5 \cdot 2^{30} \cdot 2^{-63} = 2^{-34}$$$. This guarantees that

        $$$ \left| \mathrm{sqrtl}(n) - \sqrt{n} \right| \le 2^{-34}, \left| \mathrm{sqrtl}(n+1) - \sqrt{n+1} \right| \le 2^{-34}. $$$

        Suppose $$$\mathrm{sqrtl}(n) = \mathrm{sqrtl}(n + 1)$$$, where $$$n < 2^{62}$$$, then due to the triangle inequality the above implies

        $$$ \sqrt{n+1} - \sqrt{n} \le 2^{-33}, $$$

        but we have

        $$$ \sqrt{n+1} - \sqrt{n} = \frac{1}{\sqrt{n+1} + \sqrt{n}} > \frac{1}{2\sqrt{n}} \ge 2^{-32}, $$$

        a clear contradiction.

        Nota bene: Notice that I used $$$2^{62}$$$ instead of $$$2^{64}$$$ or $$$2^{63}$$$. That is because if we allow integers up to $$$2^{63}$$$, we have a failure at $$$\mathrm{sqrtl}(2^{63}-3) = \mathrm{sqrtl}(2^{63}-4)$$$.

»
18 месяцев назад, # |
Rev. 2   Проголосовать: нравится +8 Проголосовать: не нравится

C++20 also has https://en.cppreference.com/w/cpp/numeric/countr_zero and https://en.cppreference.com/w/cpp/numeric/countl_zero to replace __builtin_ctz and __builtin_clz. They return the correct answer (the number of bits of the corresponding datatype) when passed 0 as input, so there's no undefined behavior.

»
14 месяцев назад, # |
Rev. 2   Проголосовать: нравится +14 Проголосовать: не нравится

Don't use floor((double) a / (double) b) or similar.

For some years now, floating-point division has been faster on x86 than integer division. (On top of that, it is also vectorizable, but it’s faster even when unvectorized.)

And for int32, dividing through double is exact.

So for int32: do use (int) ((double) a / b) or similar; just make sure you pick the rounding you need from (int)/(int)(_+0.5)/floor/ceil. At least as long as you’re on x86, that is, but I strongly suspect Codeforces is.

For int16, divide through float for even more speed.

[1] [2] [3]

  • »
    »
    14 месяцев назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    And for int32, dividing through double is exact.

    On 32-bit architecture as well?

    In any case, what I really wanted to convey with this post is that these operations can be inexact. If you know what you are doing and you know what is exact and what isn't, of course go ahead and use them.

    But to a lot of people, it doesn't even occur that there might be a problem. In particular, this blog post was written one hour after a contest where a lot of people used one of these and the comments under the announcement were full of "it doesn't work, this is unfair, this is someone else's fault, please restore my ratings".

    • »
      »
      »
      14 месяцев назад, # ^ |
      Rev. 3   Проголосовать: нравится +3 Проголосовать: не нравится

      On 32-bit architecture as well?

      I believe so.

      Floating-point types are the same regardless of whether the architecture is 32-bit or 64-bit. The only difference (which you may be thinking of) is that at least some compilers on 32-bit x86 default to running arithmetic on the x87 coprocessor in 80-bit mode and only round back to 32/64 bits when storing the result into a variable. This extra precision can occasionally make for a slight difference in the floating-point result.

      But here, the key is that int32 (and indeed, all the way up to int53 if I’m not mistaken) fits exactly in double (float64) with additional room to spare, so division is exact after rounding the result to an integer regardless of the original precision: you just can’t get any result that’s fractional but so close to an integer that floating-point rounding could tip it over. And exponentiation (pow) is, of course, completely exact as long as the result fits in the mantissa (int53).

      Your point still stands that these operations are dangerous for larger integers and that the more advanced mathematical functions may not be very accurate in the first place, and integer operations are often faster. But counterintuitively, division isn’t one of them currently.

    • »
      »
      »
      14 месяцев назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Indeed, this is what makes integer division possible in languages like JavaScript and Lua where all numbers are always doubles.

»
14 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Can someone explain to me how pow(a, b) uses floating-point numbers?

  • »
    »
    14 месяцев назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    See here.

    The arguments are cast to floating-point types the moment you call the function. All the work it does, it does on floating-point numbers.

»
14 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Could you give an example of functions like ceil and floor calculating so far off that the cast rounds to the wrong mathematical integer?

  • »
    »
    14 месяцев назад, # ^ |
    Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

    ceil or floor (or lround or lrint) itself cannot do this, by definition: it rounds the input floating-point number to the very nearest integer, which is guaranteed to be exactly representable. Even if the number is so big that it’s outside the basic continuous range of exactly-representable integers around zero, that just means this number is already an integer (with some trailing zeros) and ceil/floor will leave it unchanged.

    But the division may be inaccurate if the operands are big. For example:

    • The 55-bit integer $$$2^{54}$$$ is accurately represented in double, but dividing this double by 3 gives exactly 6004799503160661.0, whereas the true result is that plus $$$\frac{1}{3}$$$, so ceil(pow(2, 54) / 3) = 6004799503160661 is off by one.
    • Even among 54-bit integers, $$$2^{53} \times 1.5$$$ is accurately represented in double, but dividing this double by 11 gives exactly 1228254443828317.0, whereas the true result is that plus $$$\frac{1}{11}$$$, so ceil(pow(2, 53) * 1.5 / 11) = 1228254443828317 is again off by one.
    • Dividing the same double by 379 gives exactly 35648545863091.0, whereas the true result is that minus $$$\frac{1}{379}$$$, so this time floor is off by one.
    • (For source integers containing even more bits, the result can be off by more than one.)

    Also note that, unlike the library functions, the poor man’s nonnegative lround (int) (x + 0.5) can round badly by itself (unless you explicitly change the floating-point rounding mode from the default round-to-even to round-down or round-to-zero):

    • If x is $$$2^{53} - 1$$$ (an exactly representable integer), adding 0.5 to it causes the sum to be rounded up to $$$2^{53}$$$, so the end result is off by one.
    • If x is $$$0.5 - 2^{-54}$$$, adding 0.5 to it causes the sum to be rounded up to $$$1$$$, so the end result is off by one.
»
14 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

So how can I hack the programs which use the functions above? thx

»
10 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Can anyone tell me why the sqrt given above does not work in this case but inbuild cpp sqrt works?

Ac 208095241 TLE 208095176

  • »
    »
    10 месяцев назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    The solution with binary search works in $$$O(n^3 \cdot divisors(a_j - a_i) \cdot log(a_i))$$$. It's too slow (numbers up $$$10^9$$$ can have about $$$1000$$$ divisors). Built-in sqrt function works faster (without $$$log(a_i)$$$).