creating a search engine ??!!

Пожалуйста, подпишитесь на официальный канал Codeforces в Telegram по ссылке https://t.me/codeforces_official. ×

→ Обратите внимание

До соревнования
Codeforces Round (Div. 2)
33:12:53
Зарегистрироваться »

*есть доп. регистрация

→ Трансляции

Codeforces Round 1995 (Div 2) - Official Solution Discussion

Shayan

До начала 35:17:52

Codeforces Round 960 Solution Discussion

aryanc403

До начала 35:22:52

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	SecondThread	148
5	atcoder_official	148
8	Petr	147
9	nor	144
9	TheScrasse	144

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя Amirkasra

creating a search engine ??!!

Автор Amirkasra, 10 лет назад, По-английски

hi every body ! recently i got crazy about creating a search engine myself ! i learned about an algorithm, known as "document distance". its very interesting in my view ! if you don't know about it, it's an algorithm that calculates the angel between the vectors of two documents. actually the more different they are in subject, the larger angel they have between. google knows more ! i wonder if anyone knows some basic algorithms like "docDist" that maybe useful in making a search engine. if you got any idea of which kind of knowledge should i gain in order to build a search engine please let me (and maybe others!) know.

docs, edit distance, google

Amirkasra
10 лет назад
5

Комментарии (5)

Написать комментарий?

LashaBukhnikashvili

10 лет назад, # |

I think they can help you: google,yandex :)

→ Ответить

sexyprincess91

10 лет назад, # |

+16

The greatest book I've ever seen about search engines and their structures is Introduction to Information retrieval .

For example you can find here your factor "angle between documents" in the chapter about ranking factors and algorithms.

The overview of most fundamental components and algorithms used in search engines is quite fluent in this book, however I haven't seen so precise and so wide overview before and strongly recommend it to you. Knowledges from this book is sufficient to build simple working prototype of simple web search engine.

→ Ответить

Amirkasra

10 лет назад, # ^ |

TNXALOT !! :D

→ Ответить

AlexSkidanov

10 лет назад, # |

+16

Your search engine will naturally consist of two big parts: the crawler (that will be traversing the web and parsing pages) and the ranker (that given the query and some documents will decide which documents are more relevant to the query and which ones are less relevant).

The crawler will need to store its data somewhere. Search for "inverted indexes" -- This is how most of the search engines represent their data internally. Parsing itself is rather straightforward, but will have its own challenges (for instance, people will try to trick your crawler by introducing hidden elements on the page, that contain SEO-optimized content, and you would want to implement your crawler in such a way that it will be able to recognize such hidden elements and ignore them).

For the ranker, you will need to collect lots of different signals and use some model to rank documents based on them. It might be a hand-tuned model, or some machine learning model. Some signals will be based on the document body, in particular read about BM25:

http://en.wikipedia.org/wiki/Okapi_BM25

Some will be based on other pages linking to the current document. In particular, for a long time many search engines used techniques similar to PageRank:

http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

as a base of their rankers. These days search engines tend to lower the weight of the links in favor of other signals, since buying links was one of the most common ways of tricking search engines to rank your web site higher.

Then when you are done with the crawler and the ranker, you will have a somewhat working search engine. Then you will want to concentrate on some other aspects of it, such as correcting spelling, filtering out spam results and may be introducing some relevant ads :)

→ Ответить

Amirkasra

10 лет назад, # ^ |

thanks for BRIEF information !

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 22.07.2024 08:22:08 (g1).

Десктопная версия, переключиться на мобильную.

При поддержке