whynesspower's blog

By whynesspower, history, 7 months ago, In English

The primary concern of this post:

Prevent bots from scraping away data from codeforces, as it will make AI tools more powerful and harm codeforces in long term.

OpenAI just started providing services for to customise Large Language Models on your own custom data

Why is this a problem now?

Codeforces has a very rich database of community driven questions (Approx. 10,000) Now you can easily feed a lot of data of codeforces to ChatGPT and make it permanently learn the stuff. It will enhance its existing problem solving ablities of algorithmic quesitons. Codeforces has a large set of both the question and their respective tutorials.

(My opinion) Chances are that when chatGPT was being created, it was already fed the codeforces data once, which allows the model to be able to code in a manner which can solve codeforces questions. But it was not custom trained SPECIFICALLY for this, which is now possible.

Any individual of the world can now scrap entire codeforces (relatively easy task) and needs just $200 in GPT credits to custom train a model and make a new service or a product which can solve even the most difficult of the codeforces problems in no time.

How is leetcode fighting this? 1. (last saturday itself) They have implemented CloudFlare's anti-scraping on their website. Which makes it super difficult to scrape data from automatic scripts like selenium or beautiful soup.

I propose:

  1. Adding a service to avoid data scraping.
  2. Adding capcha wherever possible.
  • Vote: I like it
  • -29
  • Vote: I do not like it

»
7 months ago, # |
  Vote: I like it +9 Vote: I do not like it

NO!!!

I don't wanna get a CAPTCHA when submitting my solution.

I still wanna used the Codeforces script that works by scrapping the site.

»
7 months ago, # |
  Vote: I like it +1 Vote: I do not like it

AI is weak. We much better hehe.

»
7 months ago, # |
  Vote: I like it +7 Vote: I do not like it

Those who don't bow down to the AI overlords may receive harsher treatment after their rise to power...

»
7 months ago, # |
  Vote: I like it +3 Vote: I do not like it

If you add CAPTCHAS too intrusively tools like cf-tool or competitive companion might break.

»
7 months ago, # |
  Vote: I like it +24 Vote: I do not like it

Are you sure that 10.000 problems with editorials will help GPT to solve hard problems? AI is not just a magic black box that can do anything you want.

»
7 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Teaching AI to solve problems will result in more cheaters at contests, who will ask answers from ChatGPT

»
7 months ago, # |
  Vote: I like it +16 Vote: I do not like it

10k (and their 1500 or so editorials) is not that many. No matter what system you design, someone will eventually be able to get all of them.

As for the submissions however, have you actually tried scraping them? Codeforces is covered by CloudFlare, and even before that, spamming the site with requests got you a "403 Forbidden" pretty fast.

Also, the word is "scrape" ("scraping"). "Scrap" ("scrapping") means discard.

Also also, Selenium or whatever is totally overkill. Codeforces is rendered server-side almost entirely. Do a simple HTTP request and you will get the entirety of a problem statement in almost plain text.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks a lot for this mature reply. I will still try to create a system (on my own private data) to see if I can custom train the model on it as a white hat tester.