Блог пользователя Hassan_Fathi

Автор Hassan_Fathi, история, 13 дней назад, По-английски

Hello guys!

I was solving some problems on dp on subsets and I found this strange one

this got AC, but that gets TLE, and the only difference between them is swaping the inner 2 loops (mask and j).

I'm really curious to know why :D

  • Проголосовать: нравится
  • +13
  • Проголосовать: не нравится

»
13 дней назад, # |
  Проголосовать: нравится +11 Проголосовать: не нравится

I guess it's a compiler optimization

In your code if $$$g[i][j]$$$ is false, it won't do anything in the loop.

I guess GCC adds if (!g[i][j]) break; to the end of line 19

  • »
    »
    13 дней назад, # ^ |
      Проголосовать: нравится +14 Проголосовать: не нравится

    Great, but this condition will never happen if the entire grid = 1

    • »
      »
      »
      13 дней назад, # ^ |
        Проголосовать: нравится +19 Проголосовать: не нравится

      You're right, it might be because of random-access and you are defining vectors inside the function (i.e. heap/stack thing).

      But I don't think so because I resubmitted with defining in global and no difference.

»
13 дней назад, # |
  Проголосовать: нравится +36 Проголосовать: не нравится

Second code gets TLE'd because of some cache shenanigans.

My thoughts that CPU in first submit prefetches full cache line, so access to $$$newdp[mask]$$$ is almost free, but in the other submit, it can be washed out by accessing $$$g[i][j]$$$.

And AFAIK, arrays with len that are powers of two are quite bad for performance reasons on some CPUs, see this link for reference.

  • »
    »
    12 дней назад, # ^ |
      Проголосовать: нравится +23 Проголосовать: не нравится

    I thought that cache stuff don't affect complexity that much.

    Now I got it thanks :D

    • »
      »
      »
      12 дней назад, # ^ |
        Проголосовать: нравится +6 Проголосовать: не нравится

      Technically you're right. It doesn't affect the complexity at all. It does affect the constant factor though, which is what's causing the 2.5x speedup here.

»
13 дней назад, # |
  Проголосовать: нравится +18 Проголосовать: не нравится

similar thing has happened with me once on cf. i think it is due to more cache miss rate in second one but can't guess why it's happening