Spheniscine's blog

By Spheniscine, history, 6 weeks ago, In English

Forgive the quirkiness of the title; I am not aware of a name for this concept, therefore I just made a few up.

This blogpost is about a particular way to construct what I call a "pseudorandom permuter". It is distinct from a regular pseudorandom number generator (PRNG) in that it is constructed such that the sequence will never repeat until all $$$M = 2^k$$$ integers in its machine-integer state space (32- or 64-bit) has been used. In other words, it's a way to produce a permutation of $$$[0, M)$$$ that's (hopefully) statistically indistinguishable from a truly random permutation.

The generator is made up of two parts, a "discrete Weyl sequence", and an "avalanching perfect hash function".

A "discrete Weyl sequence" is defined by a parameterized function $$$W_{s, \gamma}: [0, M) \rightarrow [0, M)$$$ where the $$$i$$$-th member of the sequence is $$$W_{s, \gamma}(i) = (s + \gamma \cdot i) \mod M$$$. The two parameters can be chosen using a standard random number generator, with the caveat that $$$\gamma$$$ must be odd (this can be easily ensured by a | 1 instruction). This ensures it is coprime with the machine integer size $$$M$$$.

The advantage of the discrete Weyl sequence is that as it is parameterized, it is hard to predict by an adversarial test (whether by the problemsetter or by the "hacking" mechanic). The disadvantage however, is that it is extremely regular, and this regularity may be exploited by adversarial tests.

In comes the second part. An "avalanching perfect hash function" is a function that is "perfect" in the sense that it maps $$$[0, M) \rightarrow [0, M)$$$ without collisions, i.e. $$$a \neq b \Leftrightarrow f(a) \neq f(b)$$$, and exhibits the avalanche property, which is that flipping one bit in the input should flip roughly half the bits of the output without a regularly predictable pattern. Note that our discrete Weyl sequence fits the definition of a perfect hash function, however it doesn't exhibit the avalanche property.

Constructing a good APHF is more difficult, however since it doesn't have to be parameterized, we can simply use a fixed function. One option is to use the "multiply-xorshift" construction from the "splitmix64" generator:

function splitmix64_aphf(z: uint64): uint64 {
	z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
	z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
	return z ^ (z >> 31);

where ^ denotes the binary xor operator, and >> the unsigned right-shift operator. I also use another option for 32-bit integers, taken from this offsite blogpost by Chris Wellons:

function int32_aphf(z: uint32): uint32 {
	z = (z ^ (z >> 16)) * 0x7feb352d;
	z = (z ^ (z >> 15)) * 0x846ca68b;
	return z ^ (z >> 16);

We thus define our final hash function $$$f_{s, \gamma}(x) = A(W_{s, \gamma}(x))$$$, where $$$A$$$ is our chosen APHF. This combines the advantages of both parts; easy parametrization ("seedability") and good avalanche properties that make it hard to predict, as long as the adversary doesn't have access to the parameters or the generated values. (However, it is possible for an adversary with access to the generated values to reverse the function to find the parameters and thus predict the function; thus this construction is unsuitable for cryptographic purposes.)

Note that the actual splitmix64 generator is actually a construction of this type with a fixed $$$\gamma$$$ value. Hence my alternate name, "Splitmixes", as this construction allows easy generation of an unpredictable hash function with similar properties.


This generator should not be used like a regular random-number generator, as its "no-collision until period" property will fail statistical tests that test the "birthday paradox" properties of a PRNG, i.e. generating a large number of values and checking if the number of collisions conforms to the statistically expected value.

However, it is this very property that makes it more useful than regular PRNGs for, e.g.,

  • hash maps with integer keys (use $$$f_{s, \gamma}(x)$$$ for the hash-key of $$$x$$$)
  • priority values in treaps (simply have a counter variable and use $$$f_{s, \gamma}(0), f_{s, \gamma}(1),$$$ etc, and you're guaranteed never to produce the same priority unless you generate more than 4-billion-plus values)
  • Vote: I like it
  • +38
  • Vote: I do not like it