Google Hashcode 2021 Practice Round Discussion

Please subscribe to the official Codeforces channel in Telegram via the link https://t.me/codeforces_official. ×

→ Pay attention

Before contest
Codeforces Round 961 (Div. 2)
20:18:30
Register now »

*has extra registration

→ Streams

Codeforces Round 1995 (Div 2) - Official Solution Discussion

By Shayan

Before stream 22:23:29

Codeforces Round 960 Solution Discussion

By aryanc403

Before stream 22:28:29

CodeChef Starters 144 Solution Discussion

By aryanc403

Before stream 46:43:29

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	SecondThread	148
5	atcoder_official	148
8	Petr	147
9	nor	144
9	TheScrasse	144

View all →

→ Find user

→ Recent actions

Detailed →

TheOneYouWant's blog

Google Hashcode 2021 Practice Round Discussion

By TheOneYouWant, history, 3 years ago, In English

Hello everyone,

Since the practice round doesn't seem to provide a scoreboard, I'd like to open this thread to discuss scores and possible approaches for the problem. We got the following scores after some mostly trivial ideas:

A: 65
B: 13,328
C: 702,974,812
D: 7,602,227
E: 10,477,632
Total: 721,068,064

We mostly did some greedies, followed by randomly taking a small subset and computing best answer for that subset. I tried to use max weight bipartite matching but failed to make it work well; I don't have fast codes for max weight general matching, which could have been used to compute "good" pairs of pizzas. Did anyone manage to make this approach work or have a better idea which gave significantly better score?

+134

TheOneYouWant
3 years ago
28

Comments (24)

Show archived | Write comment?

tob123

3 years ago, # |

← Rev. 2 →

+39

We also did some greedy construction (mostly taking good matching pizzas with many ingredients first) and then used Simulated Annealing to improve it.

A: 49
B: 13,261
C: 714,252,603
D: 7,845,044
E: 10,773,367
Total: 732,884,324

→ Reply

Priyansh31dec

3 years ago, # ^ |

+17

Can you throw some light on what is Simulated Annealing?

→ Reply

tob123

3 years ago, # ^ |

+13

Actually, the greedy part is way more important. In the end, we take the most important groups (e.g. the 50 ones having the pizzas with many ingredients) and just switch two random pizzas of that group as a step. But this Annealing is actually not that much better for us than greedily taking steps that improve our current solution.

→ Reply

7pablof

3 years ago, # ^ |

We also used a greedy algorithm and randomized fine tuning step (simple local search in our case). By considering wasted ingredients in the greedy, we managed to further increase the score on D and E.

A: 74.
B: 13,773.
C: 712,312,932.
D: 8,061,713.
E: 10,988,069.
Total: 731,376,561.

→ Reply

sharath101

3 years ago, # |

We greedily tried picking the next "best pizza" (with most new ingredients) for each team from team of 4 to 2. We later also tried some optimizations but could increase the score by only a million at max. Randomized greedy solutions didn't seem to give a better score either.

A: 49
B: 11,145
C: 706,523,791
D: 7,832,329
E: 10,758,832
Total: 725,126,146

→ Reply

Shahraaz

3 years ago, # ^ |

+16

Disclaimer: this is not my work; I just happened to stumble upon it and felt like it is worth sharing

I found this repository by someone named Sagi Shporer , to add some context his team has been participating in hashcode since 2018, you can see all his repositories here

This year his team's practice round Score was 731,455,475 (soure)

A – Example — 74 points (74 before optimization)
B – A little bit of everything — 13,533 points (12922 before optimization)
C – Many ingredients 712,692,751 points (706,624,572 before optimization)
D – Many pizzas — 7,911,296 points (7,863,102 before optimization)
E – Many teams — 10,837,821 points (10,789,627 before optimization)

Algorithm description:

Phase 1: Build a solution
Phase 2: Optimizations

Phase 1 — Build a solution

Sort pizzas by the number of ingredients.
Build deliveries, first teams of 4, after that 3, after that 2:
2.1 Select the pizza with the most ingredients.
2.2 Select the pizza that will give the best improvement in delivery (most new ingredients, with the least overlapping ingredients).
2.3 Repeat 2.2 until the delivery is ready.

Phase 2 — Optimization

Try to swap 2 pizzas between any 2 deliveries — if it improves the score, make the swap.
Try to swap 1 pizza between any 2 deliveries — if it improves the score, make the swap.
Try to swap a pizza from any delivery with unused pizza — if it improves the score, make the swap.
Try to move 1 pizza between 2 deliveries (# of pizza in the 2 deliveries must be -+1) — if it improves the score, make the swap.
If any improvement performed in 1-4 — go to 1

Notes

Phase 1 takes about 5 seconds to run.
Phase 2 takes about 50 minutes to run with the current restrictions (implemented for D & E which are huge). About 1% score improvement.

Full source code

I hope you found this helpful :D

→ Reply

Modi_bhagwan

3 years ago, # ^ |

How did Phase 1 run so fast for him? Wouldn't the time complexity be quadratic in terms of the number of pizzas (M) ?

→ Reply

akshitm16

3 years ago, # ^ |

← Rev. 2 →

Breaking as soon as you get a pizza with almost new ingredients runs fast in D and E. You can't afford to traverse the whole array for choosing a pizza every-time in D and E.

→ Reply

ilovecheapthrills

3 years ago, # ^ |

Can you explain the reason for 4 to 2. Why starting with teams of size 4 is beneficial?

→ Reply

sharath101

3 years ago, # ^ |

← Rev. 2 →

Since we're greedily picking the pizzas with most ingrediants first, if we start with team of 2, then all of the "good pizzas" will be used up in teams of 2. We dont want that because the score depends on the sum of union of squares of ingredients in each team. And obviously the the distinct ingredients would be more if we use the good pizzas on a larger team size.

We also tried all 6 combinations just to be sure, but 4-3-2 expectedly gave a much better result.

→ Reply

Priyansh31dec

3 years ago, # |

← Rev. 4 →

A — 74 points — This one can be done by hand
B – 13,750 points — Greedy with some obvious brute force in picking.
C – 706,619,049 points — Picking greedily maximum ingredients pizzas first
D – 7,345,043 points — Same with some randomization
E – 10,369,792 points
Total — 724,347,708 points

Edit: Picked up small subsets and took the best possible combination from the sorted list (decreasing order of pizzas). Tried to swap pizzas between 2 deliveries randomly (swap only if it increases the score)... More the size of subsets better the answer... More the number of iterations of swapping pizzas better the answer.

A — 74 points — This one can be done by hand
B – 13,750 points — Greedy with some obvious brute force in picking.
C – 711,922,487 points — Picking greedily maximum ingredients pizzas first
D – 7,783,549 points — Same with some randomization
E – 10,636,376 points
Total — 730,356,236 points

→ Reply

Priyansh31dec

3 years ago, # |

Does having some knowledge of Machine Learning help in Google Hash Code?

→ Reply

akshitm16

3 years ago, # |

← Rev. 3 →

+12

First test set is small. So, you can try everything. For the rest, mostly picking the pizzas with largest ingredients and matching the pizzas such that increase of ingredients is maximum and/or intersection is less. Also, tried partial randomization of the sorted (based on count of ingredients) pizzas array and ran 10-2000 times depending upon the size of test set.

$$$A: 74$$$
$$$B: 12,676$$$
$$$C: 706,624,573$$$
$$$D: 8,061,427$$$
$$$E: 10,988,056$$$
$$$Total: 725,686,806$$$

→ Reply

codertonk

3 years ago, # |

A – 74 points
B – 13,862 points
C – 712,218,689 points
D – 7,923,609 points
E – 10,831,175 points
Total score — 730,987,409 points

→ Reply

Errichto

3 years ago, # |

+61

2.5-hour live stream with final score of around 730.7 million — https://youtu.be/BD57-3Zt5r4

Final code: https://github.com/Errichto/youtube/blob/master/hashcode/2021_practice_pro.cpp

→ Reply

fsshakkhor

3 years ago, # |

+19

Total: 732,415,324 points

A : None

B : 13,439 points

C : 713,419,440 points

D : 8,027,794 points

E : 10,954,651 points

Our Approach:

In our approach we tried to ensure two things mainly.

We tried to make some big deliveries with as many as ingredients possible. The reason is — the score is calculated based on square of the number of unique ingredients. So one big delivery can be far better than a few average deliveries.
Sometimes the pizzas in a delivery can have so many intersecting ingredients. So we substracted some cost for each intersecting elements.

Then we greedily tried to make deliveries starting with pizzas with most ingredients. Also we tried to make the deliveries in 4-3-2 team order which produced the best result. Lastly we tried to replace some of the pizzas with unused ones to check if it produces better result.

→ Reply

BitSane

3 years ago, # |

← Rev. 2 →

I mostly used some greedy code (picking the best every time), along with some randomization in the bigger files.
- A: 74
- B: 12,015
- C: 705,472,464
- D: 7,748,466
- E: 10,674,444
- Total: 723,907,463

→ Reply

kinhosz

3 years ago, # |

← Rev. 2 →

A. 74
B. 13,692
C. 703,144,369
D. 7,199,154
E. 9,771,415
Total: 720,128,704

→ Reply

7pablof

3 years ago, # |

← Rev. 2 →

A: 74.
B: 13,880.
C: 714,128,859.
D: 8,062,943.
E: 10,988,318.
Total: 733,194,074.

Our final approach, as most of the solutions discussed in this blog, consists in a two step process:

Greedily building a solution (based on maximizing score and also minimizing wasted ingredients).
Performing randomized improvements (i.e., simulated annealing).

In our case, improvements achieved by the second setp were negligible on most datasets (<1k), except for dataset C, which jumped from 706.6 to 714.1 million.

We would like to share some highlights on running times for dataset C:

712 million in 10 minutes (also achievable with hill climbing).
712.5 million in 30 minutes.
713.3 million in 2 hours.
714.1 million in 18 hours.

→ Reply

ilovecheapthrills

3 years ago, # ^ |

Great Score!

On which machine did you run your codes for 18 hours?

→ Reply

7pablof

3 years ago, # ^ |

Thanks! All the tests were executed my regular laptop (i7 @ 2.5GHz).

→ Reply

PanicStation

3 years ago, # |

Simulated Annealing without any greedy initialization got me to 719,012,213.

→ Reply

PanicStation

3 years ago, # ^ |

Pure Simulated Annealing gets 704,123,568 in C. Seems like greedy is the way to go.

→ Reply

bully....maguire

3 years ago, # |

I don't know if i should be ashamed of myself , but i think just implementing input and output was more difficult than solving any heavy implementation problem on CF.

→ Reply