Why's my Convex Hull Trick so Slow? - Codeforces

Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
3 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
3 days
Register now »

*has extra registration

→ Streams

CodeChef Starters 131 Solution Discussion

By aryanc403

Before stream 08:07:33

View all →

→ Top rated

#	User	Rating
1	Radewoosh	3611
2	jiangly	3597
3	Benq	3593
4	ecnerwala	3584
5	orzdevinwang	3570
6	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3487
9	gyh20	3447
10	Rebelz	3409

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

minimario's blog

Why's my Convex Hull Trick so Slow?

By minimario, history, 7 years ago, In English

In English

Hi all,

I was solving this problem: Link. It's fairly easy, so if you don't really want to solve it, I'll go ahead and write the dp recurrence here:

dp[i][k] = max(dp[j][k - 1] + p[j] * (p[i] - p[j]))

A pretty obvious Convex Hull Trick (CHT) problem. But I'm getting TLE on the last test case. I opened some AC codes, and I didn't see much difference between our codes, so I'm wondering what makes mine so slow and the other one so fast!

If anyone has some insights, please comment (or PM me) with details!

(P.S.: If anyone has a fairly fast implementation of CHT that would like to challenge the 436 ms, go ahead and submit it :))

Thanks so much,

-minimario

Tags

cht, optimization, apio

+25

minimario
7 years ago
3

Comments

Comments (3)

Write comment?

»

7 years ago, # |

Vote: I like it

+65

Vote: I do not like it

It seems you have fallen victim to locality of reference (wiki).

I switched the indices of the array last in your code, and now it passes in 513 ms.

→ Reply

»

»

7 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

orz, thanks so much! Needless to say, you have my upvote :)

So from my understanding, since I am accessing last[k][1], last[k][2], ..., last[k][k], the computer "mindlessly" expects me to access last[k][stuff] every time, whereas last[1][k], last[2][k], ..., will cause a lot of cache miss penalty.

So then, if I'm right, why didn't much change when I tried the same thing for the "dp" array? It seems to be accessing dp[1][k%2], dp[2][k%2], ..., and I hope that based on what I wrote before, dp[k%2][1], dp[k%2][2], ..., would be a faster option.

→ Reply

»

»

»

7 years ago, # ^ |

← Rev. 5 →

Vote: I like it

+15

Vote: I do not like it

Perhaps "mindlessly" is not the right word.

My (admittedly limited) understanding is that the processor loads contiguous chunk of stuff from RAM into cache (which is uber fast); so when you access last[k][2] after last[k][1], it is expectedly already in cache, so the access is fast.

But your question still remains valid, and to resolve that, note that sizeof(dp) = 2 * 100005 * 4 bytes < 1 KB, whereas modern processors have L1 cache (the super fast kind) on the order of 32 KB, so my guess is that the processor loads the entire array into cache, so it doesn't miss at all.

EDIT: I can't multiply for shit, it's < 1 MB, which is larger than size of L1 cache, but still in the right order of magnitude for total cache size.

→ Reply