Hi Codeforces! In Round 507 earlier today, a large number of "mostly correct" randomized solutions on 1039B - Subway Pursuit were hacked. I wanted to write a quick explanation of how it's possible to hack these, as well as a guide on how to write your code so that you aren't vulnerable to getting hacked.
First to go over the solution quickly (skip down a few paragraphs if you already know the solution): one can use binary search to reduce the range of possible locations for the train to a constant multiple of K, such as 6K. Just make sure to expand out both ends of the range by K after every iteration. Once the range is at most 6K, one can repeat the following:
1) Make a random guess out of the remaining numbers. The probability of success is at least .
2) Binary search again until the range is at most 6K again. It turns out this only takes one query, since after our previous guess our 6K range will become an 8K range, and a single query will reduce this to again.
Since K ≤ 10 and we have 4500 queries, this works because our probability of failure on any test case is at most , extremely low.
However, this makes a key assumption that the train locations are independent of your queries. The main problem that most hacked solutions had was being completely deterministic; i.e., the programs query the same sequence of numbers on every run, given the same output. Most people had this issue due to using
rand() without ever calling
srand(). There were a few other causes as well:
random_device is oddly deterministic on Codeforces, as is
mt19937 without a seed. Calling
srand() with a fixed value is no good either, as the program is still totally predictable.
Because of the predictability, these programs are quite easy to hack: simply generate the same sequence of queries that the program would make, and set up the train to always be at a different location from each query. To make this even easier for yourself when hacking, you can choose N = 2 and K = 1, which skips the initial binary search phase, and then literally move the train to the non-queried option between 1 and 2 every time.
To get around this, many competitors seeded their generators with the current time by calling
srand(time(NULL)); This stops the code from being deterministic and makes it less likely you will get hacked, but it turns out this is still very possible to hack. How? The main problem here is that
time(NULL) is only precise to one second. So if I'm able to guess the second that your program gets run, your program is effectively deterministic.
It turns out I don't even need to guess. If I set up N = 11 and K = 10, I can produce all the different query sequences you might generate in the next 10 seconds, since your code will almost certainly be run a few seconds after my generator. Then for every query, at least one of the 11 positions will not be chosen by any of the 10 sequences, and I can move the train to that position each time. Unfortunately -- or fortunately for the people in my room ;) -- I didn't have time to do this during the contest.
The fix is quite easy.
time(NULL) has the right idea, but we should use a much more high-precision clock:
Then generate random numbers by calling
rng(). Remember to include
<random> for this.
- Why didn't I use
srand()? See this post: Don't use rand()
Another option is to use the address of a newly allocated heap variable:
mt19937 rng((uint64_t) new char);
This should be different every run. Note this creates a tiny memory leak for the life of the program since we never call
delete, but since it's just one variable it's not a big deal.
I personally like to combine both of these methods, but it isn't necessary. Either one will give your program much more unpredictability, making it effectively impossible to hack.
Thanks for reading! This will likely lead to me getting fewer hacks in the future, but I thought the community should have a guide explaining this unusual situation.