Why you may want to use two-layer dp not only to decrease the memory usage

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	atcoder_official	148
5	-is-this-fft-	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

I was upsolving AtCoder Beginner Contest 195 problem F, aka "Coprime Present" where you need to count subsets (of a special set of size < 72) with pairwise coprime elements.

My solution is a bitmask dp over prime factors that were used and a prefix of the set. My first submission.

Fairly straightforward, nothing smart, $$$O(N \cdot 2^n)$$$ time and memory. However, I was disappointed by the memory usage of 600+ Mb. Therefore, I implemented a two-layer version of the same dp. Here is my second submission.

The difference is minimal, I just take layer_number & 1, and don't forget to clear the previous layer before proceeding to the next one

Obviously, it uses way less memory, only 30 Mb. What surprised me is that the running time has also decreased, from 472 ms to just 147 ms, for an improvement by a factor of 3!

I have two hypotheses on why this happens:

two layers fit into the better cache level, facilitating memory lookups;
memory allocation itself takes some time (there must be a reason why people write custom allocators, right?).

Can someone more experienced please tell me what actually happens there?

Rev.	By	When	Δ	Comment
en3	nskybytskyi	2021-06-29 11:58:54	0	(published)
en2	nskybytskyi	2021-06-29 11:53:42	8	Tiny change: 'ts (of a set of si' -> 'ts (of a special set of si' (saved to drafts)
en1	nskybytskyi	2021-06-29 11:50:44	1383	Initial revision (published)

Rev.

Lang.

When

Comment

en3