Блог пользователя diego_v1

Автор diego_v1, 11 лет назад, По-английски

I just encountered a weird issue with problem 333D.

Check these two codes:

Accepted: 4201145

TLE: 4201157

The only difference between these two submissions is that in the first one, it says if(A[i][k]>B&&A[j][k]>B) and in the other one it says if(min(A[i][k],A[j][k])>B). How can two statements that are practically identical be the difference between getting accepted with 1122 ms and getting TLE over 3000 ms ?!?!

Another coder suggested that maybe the std::min function is slow, so I changed it to a statement of the form a<b?a:b, and it still gets TLE, so the issue is not the std::min function.

Another TLE: 4201326

I'm puzzled...

  • Проголосовать: нравится
  • +8
  • Проголосовать: не нравится

»
11 лет назад, # |
  Проголосовать: нравится -9 Проголосовать: не нравится

It is true that min/max functions from are slower than just straight check. It is probably due to the template nature of min/max functions. They have to identify variables of which types are given as parameters. I do not know much about this. Hope the more advanced coders can explain this phenomenon to you.

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится +4 Проголосовать: не нравится

    I checked that.

    I replaced the std::min function with a statement of the form a<b?a:b, and it still gets TLE, so it's not the min function what's making the program get TLE.

    For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

»
11 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

professor bill says right, the code of min function in c++98 is: //// template const T& min (const T& a, const T& b) { return !(b<a)?a:b; // or: return !comp(b,a)?a:b; for version (2) } //// first, this is template function that is slower than the normal. second, you call the min function 10^6 times, and each time you call this "non inline" function, it has overhead. however, anybody knows your code must be accepted first try ;) it's the reason of some people use: define Min(a, b) (a < b ? a : b)

  • »
    »
    11 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +4 Проголосовать: не нравится

    Just like I said to Professor Bill, I replaced the min function with a statement of the form a<b?a:b and it still gets TLE.

    Check the submission -> 4201326

    For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

    • »
      »
      »
      11 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      wow its very strange, man :)) like professor Bill said, "Hope the more advanced coders can explain this phenomenon to you" :D

    • »
      »
      »
      11 лет назад, # ^ |
      Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

      I don't know the details of the problem but have you asked yourself whether

      if(A[i][k]>B&&A[j][k]>B)

      and

      if(min(A[i][k],A[j][k])>B)

      a̶r̶e̶ ̶s̶e̶m̶a̶n̶t̶i̶c̶a̶l̶l̶y̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶t̶h̶i̶n̶g̶?̶ ̶I̶n̶ ̶t̶h̶e̶ ̶f̶i̶r̶s̶t̶ ̶o̶n̶e̶,̶ ̶y̶o̶u̶ ̶a̶r̶e̶ ̶r̶e̶q̶u̶i̶r̶i̶n̶g̶ ̶b̶o̶t̶h̶ ̶v̶a̶l̶u̶e̶s̶ ̶t̶o̶ ̶b̶e̶ ̶h̶i̶g̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶B̶.̶ ̶I̶n̶ ̶t̶h̶e̶ ̶s̶e̶c̶o̶n̶d̶ ̶o̶n̶e̶,̶ ̶j̶u̶s̶t̶ ̶t̶h̶e̶ ̶s̶m̶a̶l̶l̶e̶r̶ ̶o̶f̶ ̶t̶h̶e̶ ̶t̶w̶o̶.̶

      UPD: I'm sorry, I didn't make any sense here. I meant something like what mukel said below. The if is short-circuited so everytime the left-side statement is false, you save one comparison. In the min (or ternary) version, you always do two comparisons.

      I believe CF uses something like gcc -O2 to compile (level 2 optimizations). I generated the assembly code (with extra flags -Wa,-aln=asmsource.asm) of the following two simpler codes. Turns out the Ternary one does more jumping and more operations. So maybe that's why, you're getting a much bigger constant-factor by using ternary instead of if's .

      int a, b, c;
      a = 33, b = 22, c= 55;
      if ((a < b ? a : b) < c) {
      	b = 5;
      }
      
      int a, b, c;
      a = 33, b = 22, c= 55;
      if (a > c && b > c) {
      	b = 5;
      }
      
      • »
        »
        »
        »
        11 лет назад, # ^ |
          Проголосовать: нравится +3 Проголосовать: не нравится

        if the smaller value is bigger than B then both the smaller and the bigger values are bigger than B

      • »
        »
        »
        »
        11 лет назад, # ^ |
          Проголосовать: нравится +2 Проголосовать: не нравится

        Well, if we have X>Y and X>B, then we have Y>X>B, so yes, if we check if the minimum of the two values is higher than B, we are checking that both values are higher than B.

»
11 лет назад, # |
Rev. 3   Проголосовать: нравится +3 Проголосовать: не нравится

I heard about something called "compiler optimizations" there are many types of these optimizations but I'm not experience in these optimizations, but I guess one of these types is the reason for this

look at this blog

»
11 лет назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

I have some possible reasons. First is the that the && operator does not evaluate the whole expression if the first comparison is false, so avoiding a redundant comparison in some cases. Second: the cache unfriendly code, which means that the way you access array A can make the difference since A[i][j] is not the same as A[j][i], but in this case I don't see any problem since k is in the inner-most loop. Third: Compiler sometimes make light fast optimizations, maybe you were just lucky.

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится +15 Проголосовать: не нравится

    Seems like && operator's feature avoided most of cache unfriendly code(A[j][k]) to be run.

    min(A[i][k], A[j][k]) needs to actually access both A[i][k] and A[j][k] to work and this is VERY cache unfriendly and may be significantly slow.

    But with A[i][k] > B && A[j][k] > B, in most case A[i][k] > B does not hold, so we won't access A[j][k] and avoided a large number of cache misses.

    • »
      »
      »
      11 лет назад, # ^ |
        Проголосовать: нравится +1 Проголосовать: не нравится

      Exchanging the checks still produces fast solution 4204040 (1310ms compared to 1028ms):

      f(k,0,M)if(A[j][k]>B && A[i][k]>B)

      But I still believe your explanation is correct. Just sticking to one array line, no matter whether it is i or j allows to stay in cache.

      • »
        »
        »
        »
        11 лет назад, # ^ |
        Rev. 3   Проголосовать: нравится +6 Проголосовать: не нравится

        I don't think that the reason of TLE that accessing both A[i][k] and A[j][k].

        because I tried as BekzhanKassenov said and used fastMin and got AC in 1600ms 4203936

        however, I tried the following code on custom test first using normal min and second using fastMin , this time normal min was faster:

        normal min runs in time about 5550ms

        #include <iostream>
        using namespace std;
        int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
        int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }
        
        int main(){
        int h=5;
        for(int i=0;i<1000000000;i++){
        h=min(h,i%544);
        }
        cout<<h<<endl;
        }
        

        fast min run in time about 6146 ms

        #include <iostream>
        using namespace std;
        int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
        int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }
        
        int main(){
        int h=5;
        for(int i=0;i<1000000000;i++){
        h=fastMin(h,i%544);
        }
        cout<<h<<endl;
        }
        

        so how fastMin is faster than normal min in diego_v1 's code but it's slower in custom test? very confusing

        • »
          »
          »
          »
          »
          11 лет назад, # ^ |
            Проголосовать: нравится +3 Проголосовать: не нравится

          I recommend reading this: http://en.wikipedia.org/wiki/Branch_predication

          Generally, fastMin got "fast" by avoiding branches.

          In your test case i%544 >= h almost always holds, in this case processor can predict the if-branch perfectly and has nearly no misprediction at all. fastMin did more calculation, so it's slower.

          • »
            »
            »
            »
            »
            »
            11 лет назад, # ^ |
            Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

            although I didn't understand "Branch_predication" ,I changed this line

            h=fastMin(h,i%544);

            into

            h=min((7*i)%3468,(6*i)%3468);

            and

            h=fastMin((7*i)%3468,(6*i)%3468);

            now I think there's no prediction but still normal min faster

            • »
              »
              »
              »
              »
              »
              »
              11 лет назад, # ^ |
                Проголосовать: нравится 0 Проголосовать: не нравится

              Still there are big consecutive chunks of i, where right operand is smaller (same for left), so prediction had good hit rate. Try to use random values

              • »
                »
                »
                »
                »
                »
                »
                »
                11 лет назад, # ^ |
                  Проголосовать: нравится 0 Проголосовать: не нравится

                What about this test

                h=min((1+i)%2,(i)%2);

                still normal min is faster and also with bigger difference this time

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 лет назад, # ^ |
                    Проголосовать: нравится 0 Проголосовать: не нравится

                  Alternate values are also GOOD for prediction.

                  Please, take some time to briefly understand how modern processor works.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 лет назад, # ^ |
                  Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

                  Well Alternate values avoid the consecutive chunks where the same side is smaller.

                  how does the algorithm of this prediction works?

                  the the processors turned into humans and have brians for thinking?

                  UPD: I used random numbers this time and this time FastMin was faster but the deference is very small ,but in diego_v1 's code FastMin was very faster with big deference, how?

                  anyway when solving problems in contests the test cases may not be randomly generated, this is the problem

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 лет назад, # ^ |
                    Проголосовать: нравится 0 Проголосовать: не нравится
        • »
          »
          »
          »
          »
          11 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Good test! You ruled out the cache theory. After reading more about branch optimizations I found this. A simple and elegant way for fastMin, any of us could invent it.

          inline int min2(int a, int b)
          {
            bool bCond = a < b;
            return bCond * a + !bCond * b;
          }

          It works! 1778 ms, 4205726

»
11 лет назад, # |
  Проголосовать: нравится -10 Проголосовать: не нравится

Try to use this functions:

int fastMax(int x, int y) { return (((y-x)>>(32-1))&(x^y))^y; }
int fastMin(int x, int y) { return (((y-x)>>(32-1))&(x^y))^x; }
»
11 лет назад, # |
Rev. 2   Проголосовать: нравится +2 Проголосовать: не нравится

Actually current processors are trying to predict which way to go. As worst case scenario for this condition to be false (as otherwise with two such "hits" we will stop searching), in most cases we would not execute if body and processor correctly skip it in its prediction. While if min is used, min condition can go both ways and processor "misses" about half time. Hence the difference

  • »
    »
    11 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    Any chance to make it understandable?

    UPD: Thanks, after edit I begin to get it. "if body" caused also some confusion to me but now I see it's the body of "if" statement.

»
11 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I have a Similar problems in USACO, I got 0.44s using min function in algorithm while 0.22s using "#define MIN(a,b) ((a)<(b)?(a):(b)). In my Opinion, it wastes time in addressing three times and min function problem.