ICPC Graph Mining Challenge solution

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hi, fellow programmer :)

I want to describe ideas that I used during ICPC Challenge 2020: Marathon.

First, how to get more than 600k score?

For 600k score we can just implement what this article says. Lets start with each cluster having only one vertex. (1) Then do the following thing until converge: choose random vertex and move it to one of the adjacent vertices cluster if it makes result better. (2) Now compress each cluster to one vertex and make new graph with weighted edges, then do everything like in (1) but now you are making clusters of clusters. I think image in article above explains it quite good. So, do (1) once and (2) until converge. Basically, that's it.

features

Second, let's analyze recieved clusters

As we get relatively close to optimal result, we now can analyze the structure of our partition. Here is random example for A3 in form {size of clusters : number of clusters with such size}:

example

We mostly have 10-50 clusters with big size and 1000-3000 clusters with single vertex.

why?

So we can improve result by making better partition of big clusters or choosing better singleton vertices.

Third, kind of genetic algorithm

Notice that in (1) and (2) we are using random, so every new run give us new partition. The main idea is the next: different partitions are better than other partitions in some local regions; So we can combine them to get better result.

Particularly, what i did:

generate 50-200 partitions
choose random 5-15 of them and make new partition according to the next principal: 2 vertices are in the same cluster only and only if they are in the same cluster in every of chosen 5-15 partitions. Take a look on image.
run 5-10 times (2) on new partition (do 3. on the same new partition from 2. 8 times, parallelly on processor cores and than take the best one)
replace the worst one of 5-15 partitions chosen on 2. with the best one of 8 partitions generated on 3.
do 2.->3.->4. 500-5000 times (it takes 5-20 hours on my laptop depending on A1/A2/A3)

Also you can take 5-100 the best partitions, generate 195-100 new partitions and do 5. again. This way you will probably stuck at different local maximum.

One more small improvement

So far in (1) and (2) we were only moving vertices to adjacent clusters. We can get extra 100-200 points if we will also try to move vertices to new singleton cluster.

That's all. Hope my english is not too bad and you like this post <3

Comments (5)

Write comment?

Mustang98

4 years ago, # |

+47

Thanks a lot for sharing!
It's always interesting to read marathon solutions that have proven to be the best after a long-time challenge. They are usually very close to the optimal and very diverse in terms of used techniques and ideas.

→ Reply

DamianS

+16

Thanks for sharing your cool insights! The genetic algorithm is very nice.

One question: do you run the Louvain algorithm with Modularity only (ignore Density for this step and only later create lots of singletons) or you somehow take Density into account at this step already?

reeWorld

4 years ago, # ^ |

I run Louvain algorithm with both Modularity and Regularization. Both parameters are similar in updating.

amaturecoder

Can you also post how much the scores for each test case improved after each idea you used (Louvian, big vs. singleton clusters, genetic)?

← Rev. 2 →

Scores during the contest were a little bit different than I show now because of some small features that I don't remember now. But main trends remains.

I run next steps consecutively (10 times) for each given graph:

40kk iterations for (1)
4kk iterations for (2) 10 times
4kk iterations for last small improvement

And here what I get

A1:

min score = 197805.434; max score = 199004.765;
min score = 218539.836; max score = 219826.315;
min score = 219847.198; max score = 220356.694;

A2:

min score = 181556.039; max score = 186310.483;
min score = 181746.811; max score = 186310.199 (idk why it is smaller, xD);
min score = 181778.535; max score = 186346.688;

A3:

min score = 180625.333; max score = 180877.938;
min score = 198653.666; max score = 199023.971;
min score = 198922.480; max score = 199290.700;

Sum = 220356.694 + 186346.688 + 199290.700 = 605994.082

Genetic algorithm (as tested during contest on 500+ iterations 50+ partitions) gives 220700+ for A1, 189680+ for A2 and 200480+ for A3

Also, I got about 609k-610k (depending on partitions) when I ran only 1 iteration of genetic algorithm

reeWorld's blog

First, how to get more than 600k score?

Second, let's analyze recieved clusters

Third, kind of genetic algorithm

One more small improvement