zoomer's blog

By zoomer, history, 9 years ago, In English

I just got AC for this dynamic programming problem 401D - Roman and Numbers after many TLEs, WAs and trials using recursion then loops..

I have faced some behaviour, which -in my opinion- seems to be strange, and I can't figure out why it behaves like that:

  1. Declaring the dp array like this : dp[100][1<<18] gives a TLE. However, dp[1<<18][100] doesn't.

  2. using a long long for bitmask is slower than int !!! .... I know that 32bit integer would be sufficient for this problem, but I thought it would be safer to use long long.

I hope someone can explain it for me, thanks a lot :D

  • Vote: I like it
  • 0
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it +9 Vote: I do not like it

I dont know about your first question but using long long is slower than integer because long long has got 64bits and integer has got 32bits.

For example to sum 2 number in long long you have to check 64bits but in integer you have to check 32bits.So that means integer is x2 faster than long long. Its better use long long just when necessary.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    But I thought that operations on both 32bits integer and 64bits long long are made in the same time, not bit by bit!!

»
9 years ago, # |
Rev. 2   Vote: I like it +7 Vote: I do not like it

The first issue most likely is due to cache hits/misses. The second version will often run faster if you access multiple values dp[x][y] with the same x roughly at the same time -- which is indeed the case if you iterate over all subsets in the outer cycle of your DP. When you access some dp[x][y1], a whole chunk of consecutive bytes of memory (most likely including the entire dp[x]) can be brought into the cache, which then speeds up access to dp[x][y2].

The big picture here is that code that makes consecutive memory accesses will run faster than code that jumps all over memory. For the same reason, iterating over a vector can be much faster than iterating over a linked list with the same number of elements, if those elements happen to be scattered all across the process's memory.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks a lot .. it's quite clear now ..

    I don't want to look like a dumb, but I don't know much about how memory works internally.

    so I have one little question: I have learnt that accessing an element in an array made by calculating its location using a little equation.. and this makes all elements in array are accessible in a constant time.. why do "consecutive memory accesses will run faster"? although elements here are an array in the same block in the memory! Thanks,

    • »
      »
      »
      9 years ago, # ^ |
      Rev. 2   Vote: I like it +7 Vote: I do not like it

      If you have an array A[3][3], the elements are stored in the memory in the order A[0][0], A[0][1], A[0][2], A[1][0], A[1][1], and so on.

      If you have an array dp[100][1<<18], there about 16 megabytes of data between the elements dp[31][12345] and dp[47][12345]. If you access these two memory locations one after another, your program will be forced to access each of them separately.

      On the other hand, if you have an array dp[1<<18][100], there are only a few bytes of data between dp[12345][31] and dp[12345][47]. These memory accesses happen locally, in the same short chunk of memory, and that may often make them faster.

      Look at these two pieces of code:

      They both do the exact same thing: they fill a matrix with data and then sum the data a few times. The first one is cache-friendly, accesses the physical memory in consecutive order, and terminates in 0.18 seconds. The second one jumps all over the place, which makes it run 4.89 seconds.