Please subscribe to the official Codeforces channel in Telegram via the link https://t.me/codeforces_official. ×

hieu_2004's blog

By hieu_2004, history, 11 months ago, In English

Recently, I have seen a lot of blogs talking about the issues of cheaters. Therefore, I am currently thinking about using automatic system to catch them.

Currently, the most well-known automatic system for assisting of detecting plagiarism is MOSS(from Stanford). At first, I asked myself, why did not Codeforces use them? However, I look at the number of participants of each contests; it turns out that the count is approximately under 30000. So, we have to compare $$$4.5*10^8$$$ pairs of source code!

Assuming that the system can check $$$10^4$$$ pairs per second, we will need $$$45000$$$ seconds, which is just more than half a day, the same length as hacking procedure of Educational Rounds. But I believe that limit is much lower (I have not used it).

Is there any assistance like that could run that fast, if not MOSS? Is there any solutions that can drop the complexity of $$$O(n^2 * t)$$$? (assuming $$$t$$$ is the time for comparing a pair of code)

 
 
 
 
  • Vote: I like it
  • +16
  • Vote: I do not like it

»
11 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

we can do the same thing on any random contest from any two month period, where we will decrease cutoff of similarity, so that more persons could get caught.

technically we can make relation tree of variables, like now you can do some automaton or suffix sorting kind of thing to make smaller groups, by neglecting those pairs which will definitely differ in code perspective.

also, i guess moss is system only for text based comparison, does it also compares machine level code??

»
11 months ago, # |
  Vote: I like it 0 Vote: I do not like it

However, I look at the number of participants of each contests

I think there should be a difference between this and the actual number of checks needed, because:

  • Some participants may not send a single submission.
  • There are more submissions than the number of actual participants (because one can submit multiple times).
  • Your statement assumes that we have to check for every 2 submissions of different problems, while this is unnecessary.

That's why I think there should be less comparisons, much less than $$$4.5 * 10^8$$$ pairs.