ItsNear's blog

By ItsNear, history, 21 month(s) ago, In English,

Hi,

Myself together with a small team of researchers are trying to teach machines solve competitive programming problems.

For that we need a large dataset of problems and solutions.

We have already crawled practically all websites that have public solutions, and are now trying to crawl solutions that are not public.

If you were solving problems from Timus, UVa or any other platform with private submissions, and are open to giving us access to your account to crawl the solutions, please send your credentials to me via a personal message. It will help us a lot with our research. The language in which you were solving problems doesn't matter.

We won't publish your code anywhere, and won't use the credentials in any way except for crawling the solutions.

Besides that a reminder that we have a labeling platform where we are trying to rewrite competitive programming problem statements in a short concise way. We pay for doing it, and many people who are presently helping us are making $12/hour. The link to the platform is

https://r-nn.com

It is a very nice way to get extra income for people who can't have a full-time job due to practicing for the upcoming competitions or studying.

 
 
 
 
  • Vote: I like it  
  • +96
  • Vote: I do not like it  

»
21 month(s) ago, # |
  Vote: I like it +74 Vote: I do not like it

That's a very bold thing to do, wish you luck, but I (and many other people) think that science is not ready for such thing. But fortune favours the brave.

  • »
    »
    21 month(s) ago, # ^ |
      Vote: I like it +46 Vote: I do not like it

    Yes, people are divided into those who think it's a decade away and those who think it is around the corner. I belong to the latter group :)

    Both me and a friend of mine have left our day time jobs to work on this full time, so we are quite invested in our belief :)

»
21 month(s) ago, # |
  Vote: I like it 0 Vote: I do not like it

Does UVa even store solutions?

  • »
    »
    21 month(s) ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You are right, apparently it doesn't.

    We haven't found any UVa accounts yet, so it didn't come up before.

»
21 month(s) ago, # |
  Vote: I like it +1 Vote: I do not like it

I just registered in the platform and I see that we have to win points on it. So more points means more payment or its just a sort of recruitment test?

  • »
    »
    21 month(s) ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    It's directly proportional to the payment, 20K points = $20

»
21 month(s) ago, # |
  Vote: I like it 0 Vote: I do not like it

Do you crawl only the AC solutions? Have you crawled on platforms like SPOJ? They don't seem to have public solutions.

  • »
    »
    21 month(s) ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, SPOJ doesn't have public solutions, so we need access to individual people accounts to crawl them.

    • »
      »
      »
      21 month(s) ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      You may ask SPOJ folks about possible access to solutions to you for research purpose. They might allow it. I have set around 40 problems on SPOJ. I will try to ask them whether I am allowed to crawl/store those submissions.

»
21 month(s) ago, # |
  Vote: I like it +138 Vote: I do not like it

I am working on a messenger bot. Please send me your facebook account's details (e-mail, password, etc), I won't publish them.

»
21 month(s) ago, # |
  Vote: I like it -7 Vote: I do not like it

Many solutions to UvA problems are available on github, so you could crawl that for solutions, though I don't know how you could verify them for ACness.

Anyway, good luck with your research.

»
21 month(s) ago, # |
  Vote: I like it +29 Vote: I do not like it

Why not share a script that allows you to scrape your AC solutions from the respective judges and upload them to your website instead of asking users to share their passwords, something that most people likely won't do?

  • »
    »
    21 month(s) ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I wanted originally to publish the crawlers and do it the way you described, but in year 2017 it's very hard to distribute code that is supposed to be ran locally. People have very different setups.

    Out of curiosity, what would stop you from sharing your Timus account?

    • »
      »
      »
      21 month(s) ago, # ^ |
        Vote: I like it +23 Vote: I do not like it

      A possible way that this could harm a user is this:
      Let's assume that somebody has similar passwords for e.g. VK and Timus. If they don't change their password on Timus before sharing it, they are opening up a potential attack vector for their other accounts if your database gets hacked.

      • »
        »
        »
        »
        21 month(s) ago, # ^ |
          Vote: I like it +10 Vote: I do not like it

        This is a valid point.

        However, anyone who would spend effort to download a crawler and run it on their machine would probably also be willing to spend time to change their password.

        Would you mind changing your password on Timus and sharing your account with me? :)

        • »
          »
          »
          »
          »
          20 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          curl | sh is a really easy distribution method (even though still unsafe) for pretty much any programmer running a Unix-like OS, and iex (New-Object System.Net.WebClient).DownloadString('http://domain/script.ps1') is a similar alternative for users using Windows, so you actually don't have to spend much effort to download a crawler.

          Assuming that your crawler doesn't have many dependencies, this should work immediately.

»
21 month(s) ago, # |
  Vote: I like it +48 Vote: I do not like it

There is a small suggestion from my side here. If at all, at some point during your research you come to the conclusion that the task at hand seems intractable, you could consider a relatively easier problem of predicting tags for a question. Tags could be like "Segment trees", "DP" ,"Math" etc. and the features could be the constraints along with any other information you could extract using NLP.

»
21 month(s) ago, # |
  Vote: I like it -16 Vote: I do not like it

Are you using NLP to solve the CP problems?? However, that's a great breakthrough man!! Keep working on it! :)

»
19 months ago, # |
  Vote: I like it -8 Vote: I do not like it

Is the website still working? I keep getting 504 Gateway Time-out.