very interesting research challenge

→ Pay attention

Before contest
Helvetic Coding Contest 2024 online mirror (teams allowed, unrated)
07:37:44
Register now »

→ Top rated

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	SecondThread	145
9	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

blyat's blog

very interesting research challenge

By blyat, history, 3 years ago, In English

hello codeforces

i want to suggest a research

let's call the depth of an english wikipedia page the smallest amount of clicks on links (in the text) to reach this page.

what is (approximately) the mathematical expectation of this value (the page is chosen randomly equiprobably)? I'm waiting for your approaches (with description)!!!

upd: sorry didn't know such services exist, thought it'd be interesting to use some heuristics to estimate needed values. the thread can be closed now thanks for watching subscribe on youtube and o***f***

research, strange, topic, among us

blyat
3 years ago
8

Comments (8)

Write comment?

Ina

3 years ago, # |

← Rev. 2 →

lmao why that page

It may be more interesting to get the average depth to an arbitrary page, or maybe the average distance between two pages. Also, some pages might not have any links, so they may not have a "depth". Wikipedia can be represented as a directed graph

→ Reply

arvindf232

3 years ago, # |

This project/ page contains an almost parsed database of wikipedia as a directed graph: https://github.com/jwngr/sdow/blob/master/docs/data-source.md

It is a matter of writing the correct code, and then run your graph algorithms on it (e.g. Dijkstra's algorithm)

Time Estimation(upper bound) There are 10^7 articles. We can probably safely say there isn't more than 100 links on average in all articles (only big articles have that much), so E ~ 10^9

Running Dijkstra would be on order of 10^9. One hour of calculations is probably enough. Of course, you will have to do the coding yourself.

Alternatively if you only want a simulation of it: Just use the page https://www.sixdegreesofwikipedia.com. Provided you have a way to uniformly generate a random article of Wikipedia. You can go through the statistics, but 200 simulations should be enough for precision upto 0.1.

→ Reply

blyat

3 years ago, # ^ |

thx, codeforces works better than google

→ Reply

Maksim1744

3 years ago, # |

So are you looking for the shortest path from the home page or the expected number of random clicks? If it is random clicks, then the answer is approximately $$$\frac n 2$$$ where $$$n$$$ is the total number of Wikipedia pages. This is under the assumption that the given page is a somewhat average Wikipedia page, not extremely popular, neither extremely unpopular. The answer is the same if you start from a random page, not necessarily home page.

If you start from random page and then search the shortest path, it should be pretty short, the idea is similar to this.

Another way to think about this is that you can first go to some very general page in a couple of clicks (any country, for example), the go to culture, subculture, list of subcultures. This way the number of clicks is like at most 10.

→ Reply

rip

3 years ago, # |

Lol, I suppose researching this page will bring more pleasure to you, here not everything is obvious

→ Reply

blyat

3 years ago, # ^ |

bruh... i thought that would be offensive, tried to chose something neutral but memetic

→ Reply

blyat

3 years ago, # |

Auto comment: topic has been updated by blyat (previous revision, new revision, compare).

→ Reply

ivan100sic

3 years ago, # |

+22

Just download the entire Wikipedia as a large xml file, extract all links and run BFS on the reversed link graph, and compute the average of all distances. I remember running the PageRank algorithm on the entire English Wikipedia a few years ago as a university assignment, so it's certainly possible.

→ Reply