Why you may want to use two-layer dp not only to decrease the memory usage

Revision en2, by nskybytskyi, 2021-06-29 11:53:42

I was upsolving AtCoder Beginner Contest 195 problem F, aka "Coprime Present" where you need to count subsets (of a special set of size < 72) with pairwise coprime elements.

My solution is a bitmask dp over prime factors that were used and a prefix of the set. My first submission.

Fairly straightforward, nothing smart, $$$O(N \cdot 2^n)$$$ time and memory. However, I was disappointed by the memory usage of 600+ Mb. Therefore, I implemented a two-layer version of the same dp. Here is my second submission.

The difference is minimal, I just take layer_number & 1, and don't forget to clear the previous layer before proceeding to the next one

Obviously, it uses way less memory, only 30 Mb. What surprised me is that the running time has also decreased, from 472 ms to just 147 ms, for an improvement by a factor of 3!

I have two hypotheses on why this happens:

  • two layers fit into the better cache level, facilitating memory lookups;

  • memory allocation itself takes some time (there must be a reason why people write custom allocators, right?).

Can someone more experienced please tell me what actually happens there?


  Rev. Lang. By When Δ Comment
en3 English nskybytskyi 2021-06-29 11:58:54 0 (published)
en2 English nskybytskyi 2021-06-29 11:53:42 8 Tiny change: 'ts (of a set of si' -> 'ts (of a special set of si' (saved to drafts)
en1 English nskybytskyi 2021-06-29 11:50:44 1383 Initial revision (published)