diego_v1's blog

By diego_v1, 11 years ago, In English

I just encountered a weird issue with problem 333D.

Check these two codes:

Accepted: 4201145

TLE: 4201157

The only difference between these two submissions is that in the first one, it says if(A[i][k]>B&&A[j][k]>B) and in the other one it says if(min(A[i][k],A[j][k])>B). How can two statements that are practically identical be the difference between getting accepted with 1122 ms and getting TLE over 3000 ms ?!?!

Another coder suggested that maybe the std::min function is slow, so I changed it to a statement of the form a<b?a:b, and it still gets TLE, so the issue is not the std::min function.

Another TLE: 4201326

I'm puzzled...

  • Vote: I like it
  • +8
  • Vote: I do not like it

»
11 years ago, # |
  Vote: I like it -9 Vote: I do not like it

It is true that min/max functions from are slower than just straight check. It is probably due to the template nature of min/max functions. They have to identify variables of which types are given as parameters. I do not know much about this. Hope the more advanced coders can explain this phenomenon to you.

  • »
    »
    11 years ago, # ^ |
      Vote: I like it +4 Vote: I do not like it

    I checked that.

    I replaced the std::min function with a statement of the form a<b?a:b, and it still gets TLE, so it's not the min function what's making the program get TLE.

    For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

»
11 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

professor bill says right, the code of min function in c++98 is: //// template const T& min (const T& a, const T& b) { return !(b<a)?a:b; // or: return !comp(b,a)?a:b; for version (2) } //// first, this is template function that is slower than the normal. second, you call the min function 10^6 times, and each time you call this "non inline" function, it has overhead. however, anybody knows your code must be accepted first try ;) it's the reason of some people use: define Min(a, b) (a < b ? a : b)

  • »
    »
    11 years ago, # ^ |
    Rev. 2   Vote: I like it +4 Vote: I do not like it

    Just like I said to Professor Bill, I replaced the min function with a statement of the form a<b?a:b and it still gets TLE.

    Check the submission -> 4201326

    For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

    • »
      »
      »
      11 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      wow its very strange, man :)) like professor Bill said, "Hope the more advanced coders can explain this phenomenon to you" :D

    • »
      »
      »
      11 years ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      I don't know the details of the problem but have you asked yourself whether

      if(A[i][k]>B&&A[j][k]>B)

      and

      if(min(A[i][k],A[j][k])>B)

      a̶r̶e̶ ̶s̶e̶m̶a̶n̶t̶i̶c̶a̶l̶l̶y̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶t̶h̶i̶n̶g̶?̶ ̶I̶n̶ ̶t̶h̶e̶ ̶f̶i̶r̶s̶t̶ ̶o̶n̶e̶,̶ ̶y̶o̶u̶ ̶a̶r̶e̶ ̶r̶e̶q̶u̶i̶r̶i̶n̶g̶ ̶b̶o̶t̶h̶ ̶v̶a̶l̶u̶e̶s̶ ̶t̶o̶ ̶b̶e̶ ̶h̶i̶g̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶B̶.̶ ̶I̶n̶ ̶t̶h̶e̶ ̶s̶e̶c̶o̶n̶d̶ ̶o̶n̶e̶,̶ ̶j̶u̶s̶t̶ ̶t̶h̶e̶ ̶s̶m̶a̶l̶l̶e̶r̶ ̶o̶f̶ ̶t̶h̶e̶ ̶t̶w̶o̶.̶

      UPD: I'm sorry, I didn't make any sense here. I meant something like what mukel said below. The if is short-circuited so everytime the left-side statement is false, you save one comparison. In the min (or ternary) version, you always do two comparisons.

      I believe CF uses something like gcc -O2 to compile (level 2 optimizations). I generated the assembly code (with extra flags -Wa,-aln=asmsource.asm) of the following two simpler codes. Turns out the Ternary one does more jumping and more operations. So maybe that's why, you're getting a much bigger constant-factor by using ternary instead of if's .

      int a, b, c;
      a = 33, b = 22, c= 55;
      if ((a < b ? a : b) < c) {
      	b = 5;
      }
      
      int a, b, c;
      a = 33, b = 22, c= 55;
      if (a > c && b > c) {
      	b = 5;
      }
      
      • »
        »
        »
        »
        11 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        if the smaller value is bigger than B then both the smaller and the bigger values are bigger than B

      • »
        »
        »
        »
        11 years ago, # ^ |
          Vote: I like it +2 Vote: I do not like it

        Well, if we have X>Y and X>B, then we have Y>X>B, so yes, if we check if the minimum of the two values is higher than B, we are checking that both values are higher than B.

»
11 years ago, # |
Rev. 3   Vote: I like it +3 Vote: I do not like it

I heard about something called "compiler optimizations" there are many types of these optimizations but I'm not experience in these optimizations, but I guess one of these types is the reason for this

look at this blog

»
11 years ago, # |
  Vote: I like it +5 Vote: I do not like it

I have some possible reasons. First is the that the && operator does not evaluate the whole expression if the first comparison is false, so avoiding a redundant comparison in some cases. Second: the cache unfriendly code, which means that the way you access array A can make the difference since A[i][j] is not the same as A[j][i], but in this case I don't see any problem since k is in the inner-most loop. Third: Compiler sometimes make light fast optimizations, maybe you were just lucky.

  • »
    »
    11 years ago, # ^ |
      Vote: I like it +15 Vote: I do not like it

    Seems like && operator's feature avoided most of cache unfriendly code(A[j][k]) to be run.

    min(A[i][k], A[j][k]) needs to actually access both A[i][k] and A[j][k] to work and this is VERY cache unfriendly and may be significantly slow.

    But with A[i][k] > B && A[j][k] > B, in most case A[i][k] > B does not hold, so we won't access A[j][k] and avoided a large number of cache misses.

    • »
      »
      »
      11 years ago, # ^ |
        Vote: I like it +1 Vote: I do not like it

      Exchanging the checks still produces fast solution 4204040 (1310ms compared to 1028ms):

      f(k,0,M)if(A[j][k]>B && A[i][k]>B)

      But I still believe your explanation is correct. Just sticking to one array line, no matter whether it is i or j allows to stay in cache.

      • »
        »
        »
        »
        11 years ago, # ^ |
        Rev. 3   Vote: I like it +6 Vote: I do not like it

        I don't think that the reason of TLE that accessing both A[i][k] and A[j][k].

        because I tried as BekzhanKassenov said and used fastMin and got AC in 1600ms 4203936

        however, I tried the following code on custom test first using normal min and second using fastMin , this time normal min was faster:

        normal min runs in time about 5550ms

        #include <iostream>
        using namespace std;
        int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
        int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }
        
        int main(){
        int h=5;
        for(int i=0;i<1000000000;i++){
        h=min(h,i%544);
        }
        cout<<h<<endl;
        }
        

        fast min run in time about 6146 ms

        #include <iostream>
        using namespace std;
        int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
        int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }
        
        int main(){
        int h=5;
        for(int i=0;i<1000000000;i++){
        h=fastMin(h,i%544);
        }
        cout<<h<<endl;
        }
        

        so how fastMin is faster than normal min in diego_v1 's code but it's slower in custom test? very confusing

        • »
          »
          »
          »
          »
          11 years ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          I recommend reading this: http://en.wikipedia.org/wiki/Branch_predication

          Generally, fastMin got "fast" by avoiding branches.

          In your test case i%544 >= h almost always holds, in this case processor can predict the if-branch perfectly and has nearly no misprediction at all. fastMin did more calculation, so it's slower.

          • »
            »
            »
            »
            »
            »
            11 years ago, # ^ |
            Rev. 2   Vote: I like it 0 Vote: I do not like it

            although I didn't understand "Branch_predication" ,I changed this line

            h=fastMin(h,i%544);

            into

            h=min((7*i)%3468,(6*i)%3468);

            and

            h=fastMin((7*i)%3468,(6*i)%3468);

            now I think there's no prediction but still normal min faster

            • »
              »
              »
              »
              »
              »
              »
              11 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Still there are big consecutive chunks of i, where right operand is smaller (same for left), so prediction had good hit rate. Try to use random values

              • »
                »
                »
                »
                »
                »
                »
                »
                11 years ago, # ^ |
                  Vote: I like it 0 Vote: I do not like it

                What about this test

                h=min((1+i)%2,(i)%2);

                still normal min is faster and also with bigger difference this time

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Alternate values are also GOOD for prediction.

                  Please, take some time to briefly understand how modern processor works.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 years ago, # ^ |
                  Rev. 2   Vote: I like it 0 Vote: I do not like it

                  Well Alternate values avoid the consecutive chunks where the same side is smaller.

                  how does the algorithm of this prediction works?

                  the the processors turned into humans and have brians for thinking?

                  UPD: I used random numbers this time and this time FastMin was faster but the deference is very small ,but in diego_v1 's code FastMin was very faster with big deference, how?

                  anyway when solving problems in contests the test cases may not be randomly generated, this is the problem

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  11 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it
        • »
          »
          »
          »
          »
          11 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Good test! You ruled out the cache theory. After reading more about branch optimizations I found this. A simple and elegant way for fastMin, any of us could invent it.

          inline int min2(int a, int b)
          {
            bool bCond = a < b;
            return bCond * a + !bCond * b;
          }

          It works! 1778 ms, 4205726

          • »
            »
            »
            »
            »
            »
            11 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            your function is faster than normal min in diego_v1 's code but in my test it's slower than both normal min and FastMin , how?

»
11 years ago, # |
  Vote: I like it -10 Vote: I do not like it

Try to use this functions:

int fastMax(int x, int y) { return (((y-x)>>(32-1))&(x^y))^y; }
int fastMin(int x, int y) { return (((y-x)>>(32-1))&(x^y))^x; }
  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    WAT???

    • »
      »
      »
      11 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      So many negative votes :) Here's the explanation.

      • »
        »
        »
        »
        11 years ago, # ^ |
          Vote: I like it +4 Vote: I do not like it

        The result of right-shifting a negative value is implementation defined. Furthermore, if y - x is less than INT_MIN, then a signed overflow occurs and causes undefined behavior.

        I would not rely on such a function.

        • »
          »
          »
          »
          »
          11 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Oh, understand. And what about this function:

          int fastAbs(int n) {
          	return (n ^ (n >> 31)) - (n >> 31);
          }
          
»
11 years ago, # |
Rev. 2   Vote: I like it +2 Vote: I do not like it

Actually current processors are trying to predict which way to go. As worst case scenario for this condition to be false (as otherwise with two such "hits" we will stop searching), in most cases we would not execute if body and processor correctly skip it in its prediction. While if min is used, min condition can go both ways and processor "misses" about half time. Hence the difference

  • »
    »
    11 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Any chance to make it understandable?

    UPD: Thanks, after edit I begin to get it. "if body" caused also some confusion to me but now I see it's the body of "if" statement.

»
11 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I have a Similar problems in USACO, I got 0.44s using min function in algorithm while 0.22s using "#define MIN(a,b) ((a)<(b)?(a):(b)). In my Opinion, it wastes time in addressing three times and min function problem.