ItsNear's blog

By ItsNear, history, 2 years ago, In English,

Hi,

Myself together with a small team of researchers are trying to teach machines solve competitive programming problems.

For that we need a large dataset of problems and solutions.

We have already crawled practically all websites that have public solutions, and are now trying to crawl solutions that are not public.

If you were solving problems from Timus, UVa or any other platform with private submissions, and are open to giving us access to your account to crawl the solutions, please send your credentials to me via a personal message. It will help us a lot with our research. The language in which you were solving problems doesn't matter.

We won't publish your code anywhere, and won't use the credentials in any way except for crawling the solutions.

Besides that a reminder that we have a labeling platform where we are trying to rewrite competitive programming problem statements in a short concise way. We pay for doing it, and many people who are presently helping us are making $12/hour. The link to the platform is

https://r-nn.com

It is a very nice way to get extra income for people who can't have a full-time job due to practicing for the upcoming competitions or studying.

 
 
 
 
  • Vote: I like it
  • +96
  • Vote: I do not like it

»
2 years ago, # |
  Vote: I like it +74 Vote: I do not like it

That's a very bold thing to do, wish you luck, but I (and many other people) think that science is not ready for such thing. But fortune favours the brave.

  • »
    »
    2 years ago, # ^ |
      Vote: I like it +46 Vote: I do not like it

    Yes, people are divided into those who think it's a decade away and those who think it is around the corner. I belong to the latter group :)

    Both me and a friend of mine have left our day time jobs to work on this full time, so we are quite invested in our belief :)

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Does UVa even store solutions?

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You are right, apparently it doesn't.

    We haven't found any UVa accounts yet, so it didn't come up before.

»
2 years ago, # |
  Vote: I like it +1 Vote: I do not like it

I just registered in the platform and I see that we have to win points on it. So more points means more payment or its just a sort of recruitment test?

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    It's directly proportional to the payment, 20K points = $20

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Do you crawl only the AC solutions? Have you crawled on platforms like SPOJ? They don't seem to have public solutions.

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, SPOJ doesn't have public solutions, so we need access to individual people accounts to crawl them.

    • »
      »
      »
      2 years ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      You may ask SPOJ folks about possible access to solutions to you for research purpose. They might allow it. I have set around 40 problems on SPOJ. I will try to ask them whether I am allowed to crawl/store those submissions.

»
2 years ago, # |
  Vote: I like it +138 Vote: I do not like it

I am working on a messenger bot. Please send me your facebook account's details (e-mail, password, etc), I won't publish them.

»
2 years ago, # |
  Vote: I like it -7 Vote: I do not like it

Many solutions to UvA problems are available on github, so you could crawl that for solutions, though I don't know how you could verify them for ACness.

Anyway, good luck with your research.

»
2 years ago, # |
  Vote: I like it +29 Vote: I do not like it

Why not share a script that allows you to scrape your AC solutions from the respective judges and upload them to your website instead of asking users to share their passwords, something that most people likely won't do?

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I wanted originally to publish the crawlers and do it the way you described, but in year 2017 it's very hard to distribute code that is supposed to be ran locally. People have very different setups.

    Out of curiosity, what would stop you from sharing your Timus account?

    • »
      »
      »
      2 years ago, # ^ |
        Vote: I like it +23 Vote: I do not like it

      A possible way that this could harm a user is this:
      Let's assume that somebody has similar passwords for e.g. VK and Timus. If they don't change their password on Timus before sharing it, they are opening up a potential attack vector for their other accounts if your database gets hacked.

      • »
        »
        »
        »
        2 years ago, # ^ |
          Vote: I like it +10 Vote: I do not like it

        This is a valid point.

        However, anyone who would spend effort to download a crawler and run it on their machine would probably also be willing to spend time to change their password.

        Would you mind changing your password on Timus and sharing your account with me? :)

        • »
          »
          »
          »
          »
          2 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          curl | sh is a really easy distribution method (even though still unsafe) for pretty much any programmer running a Unix-like OS, and iex (New-Object System.Net.WebClient).DownloadString('http://domain/script.ps1') is a similar alternative for users using Windows, so you actually don't have to spend much effort to download a crawler.

          Assuming that your crawler doesn't have many dependencies, this should work immediately.

»
2 years ago, # |
  Vote: I like it +48 Vote: I do not like it

There is a small suggestion from my side here. If at all, at some point during your research you come to the conclusion that the task at hand seems intractable, you could consider a relatively easier problem of predicting tags for a question. Tags could be like "Segment trees", "DP" ,"Math" etc. and the features could be the constraints along with any other information you could extract using NLP.

»
2 years ago, # |
  Vote: I like it -16 Vote: I do not like it

Are you using NLP to solve the CP problems?? However, that's a great breakthrough man!! Keep working on it! :)

»
2 years ago, # |
  Vote: I like it -8 Vote: I do not like it

Is the website still working? I keep getting 504 Gateway Time-out.