#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
9	TheScrasse	144

TL;DR

If you code on python, use set or dict, and want to avoid hacks, you can read only the last section.

Introduction

Most people interested in hacking know about the method of creating tests targeting unordered containers in C++. This method is described in more detail in post. However, creation of similar tests for set and dict in python is covered much less well, which I decided to fix with this post.

A little theory about hash tables

I will not go into detailed descriptions and proofs, because there are plenty of them in the open sources. Let us consider only what interests us in this case.

An open-addressing hash table is a data structure that stores a set of elements in an array of knownly larger size, defining the position of the element as its hash taken modulo the size of the array. If hashes of several elements give the same cell in the array, the hash table tries to take another cell according to some rule, which depends on the particular implementation of the table. Usually this rule boils down to checking the cells $$$f(x), f(f(x)), f(f(f(x)))$$$ and so on, until an empty one is found. The function $$$f$$$ is often a linear transformation $$$(a * x + b) \% size$$$, where $$$size$$$ is the size of the array, and $$$a$$$ and $$$b$$$ are relatively prime with it.

Python

Python uses a hash table with an array size equal to a power of two to implement dict, and the transformation is slightly more complex than a simple linear one -- $$$f(x) = (5x + 1 + pertrub) \% size$$$, where $$$pertrub$$$ is initially equal to hash, but is devided by 32 in each step. For a detailed implementation, see repository.

Also, the hash function for numbers in python is very predictable, it's just the number itself.

Thus, to make a countertest it is enough to find a sequence of indexes that will be searched for a particular number and preoccupy them, and then provoke a search for that number in the dictionary, which is quite easy to do for most problems.

The set implementation in Python3 is a bit more complicated, it does not test a single cell, but 10 consecutive cells, which can be observed here. However, building a test in this case is not very difficult either, just add $$$x, x + 1, x + 2, ..., x + 9$$$ instead of one number $$$x$$$.

Tests

I used 153032429 (Thanks to turkids for it) and 153408991 from Codeforces Round 781 (Div. 2) to check. The result can be found as hacks number 796358 and 796362. Judging by the "unknown verdict" result, it went too well and I broke the author's solution at the same time. More details may know shishyando and Kirill22.

How to protect yourself?

Unlike C++, Python does not provide a way to define its own hash function for an existing type (or I just don't know about it), but nobody prevents you from defining your type with a different hash function:

from random import getrandbits

RANDOM = getrandbits(32)

class Wrapper(int):
    def __init__(self, x):
        int.__init__(x)
    def __hash__(self):
        return super(Wrapper, self).__hash__() ^ RANDOM

An example of using this type can be found in 153409562. Unfortunately, while the author's solution breaks on my tests I will not be able to test the robustness of my solution with the Wrapper class. However, locally it works fast enough.

Full text and comments »

Diego's blog

begin of translation