In bottom up dynamic programing, do the order of for loop matter? - Codeforces

Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
3 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
3 days
Register now »

*has extra registration

→ Streams

CodeChef Starters 131 Solution Discussion

By aryanc403

Before stream 10:32:09

View all →

→ Top rated

#	User	Rating
1	jiangly	3640
2	Benq	3593
3	tourist	3572
4	orzdevinwang	3561
5	cnnfls_csy	3539
6	ecnerwala	3534
7	Radewoosh	3532
8	gyh20	3447
9	Rebelz	3409
10	Geothermal	3408

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

insider_pants's blog

In bottom up dynamic programing, do the order of for loop matter?

By insider_pants, history, 4 years ago, In English

In English

I'm doing dynamic programming problems and I have found a weird thing that if I change the order of the for loops in the tabulation DP, one of my solution gives TLE whereas by changing the order of loops, it got accepted.

Here's the problem im doing E. Tree Queries from recent div 3 contest. In this question I'm pre calculating LCA using binary lifting but during precalculation, the change in order of loops gives tle and other got accepted.

here's the accepted version 74428570 and here's the TLE solution 74926625. The only difference is in find_ancestor function where I changes the order of the for loop.

If changing order of for loop do matter then why I got TLE instead of WA?

Thanks

Tags

#dynamic programing, #binary-lifting, #question

-3

insider_pants
4 years ago
4

Comments

Comments (4)

Write comment?

»

4 years ago, # |

← Rev. 2 →

Vote: I like it

+3

Vote: I do not like it

Hey, the matter is that the TLE solution makes a lot of cache misses, while the AC one is very cache-efficient.

The CPU copies frequently used data from RAM to cache — faster memory located closer to the processor. It is much, much faster than RAM, but you can only fit so much data into the cache, on the order of megabytes.

Now if you access the memory at a location that is not far away from the previous access, then chances are you will find this data in the CPU cache. But if you make large jumps between memory accesses (in the TLE solution the accessed elements are at least $$$0.8$$$ megabytes apart), then the data saved in the cache likely doesn't contain what you're looking for, so you must peek into RAM which takes a lot of time.

Sometimes, swapping the dimensions reduces cache misses to such an extent that a solution gets 30x times faster — this is how important it is.

→ Reply

»

»

4 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

thanks mate, didn’t know this. So i can deduce that changing dimension do cause tle but it doesn’t give WA.

→ Reply

»

»

»

4 years ago, # ^ |

← Rev. 2 →

Vote: I like it

+3

Vote: I do not like it

Well, everything that I've mentioned in my original comment is certainly true, but as I'm looking at the code once more, I'm thinking that your code goes into an infinite loop somewhere, and the TLE order of loops is actually not correct.

With this order, you can be accessing values of DP that haven't been counted: it is not guaranteed that in dp[i][j] = dp[i-1][dp[i-1][j]]; dp[i-1][j] is actually less or equal to j. This is only true if every node's parent has a smaller index than the node itself.

Sorry, I should've read more carefully :) Edit: with assert(par < a); in the DFS function, your code gives RE100.

→ Reply

»

»

»

»

4 years ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

Thanks for pointing this out. Appreciate it.

→ Reply