Блог пользователя humbertoyusta

Автор humbertoyusta, история, 3 года назад, По-английски

This is a well known technique, but I couldn't find almost any detailed tutorial, so I'll try to help with this one.

Motivational Problem:

You have an array $$$A$$$ and $$$Q$$$ queries, which are, say if the subarray from $$$l$$$ to $$$r$$$, ($$$A[l]$$$, $$$A[l+1]$$$, $$$...$$$, $$$A[r]$$$) forms a permutation.

Some basic observations:

Only the set of elements matters, the order is irrelevant.

You want to know if each element is exactly once.

Solution:

Assign a random number $$$randval[i]$$$ to each number $$$i$$$, now we can say that if the XOR of the subarray ( $$$randval[A[l]]$$$ xor $$$randval[A[l+1]]$$$ xor $$$...$$$ xor $$$randval[A[r]]$$$ ) is equal to the xor of any permutation of size $$$r - l + 1$$$ (for example $$$1$$$, $$$2$$$, $$$3$$$, ... ,$$$r-l+1$$$, which hash is $$$randval[1]$$$ xor $$$randval[2]$$$ xor $$$...$$$ xor $$$randval[r-l+1]$$$ ) the subarray is a permutation.

Some Insights:

With the XOR we have some issues, we are counting if the number of occurrences of each element is even or odd, not the real number of occurrences, however in this problem if we had repeated elements the XOR-hash will never be the same than the XOR-hash of $$$1$$$, $$$2$$$, $$$3$$$, ... ,$$$r-l+1$$$, because if we have some numbers more than once, since there are only $$$r - l + 1$$$ numbers in the subarray, and also $$$r - l + 1$$$ numbers are needed to form the permutation, some of them cannot appear.

As an alternative to this technique we can use polynomial hash over a binary string that represents the occurrences of each element modulo $$$2$$$ ($$$x$$$-character of this string represents the number of occurrences of $$$x$$$ modulo $$$2$$$), but with XOR hash we can do it faster, with less code and case handling, and with less care about collisions and hacks.

With bitwise-OR we count the number of occurrences of each element of the set modulo $$$2$$$, if we use $$$k$$$-nary xor we will count the number of occurrences modulo $$$k$$$.

Collisions:

Now it's time to talk about collisions, as the numbers are random, the probability of collision in each query if we had only $$$1$$$ bit will be about $$$1/2$$$. And since XOR is independent on each bit, if we use k bits, the probability of collision in each query is $$$2^{-k}$$$. We can use 32-bit random integers or 64-bit random integers depending on the number of queries required. Since the numbers are generated randomly, I think this hashing can't be hacked, but I am not sure, if someone knows more about this, please send some feedback.

More problems using this technique (in no particular order of difficulty):

F. The Number of Subpermutations from Educational Codeforces Round 66 Here editorial doesn't use it, but I have one solution with this technique.

Quick explanation (spoiler)

E. The Untended Antiquity from Codeforces Round #439 Div 2

G. Three Occurrences from Educational Codeforces Round 95

C. Square Subsets from Codeforces Round #448 Div 2 but if we had just 40 numbers up to 10^6.

Quick explanation (spoiler)

Feel free to send any feedback.

  • Проголосовать: нравится
  • +277
  • Проголосовать: не нравится

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +10 Проголосовать: не нравится

Thank you humbertoyusta

»
3 года назад, # |
  Проголосовать: нравится -7 Проголосовать: не нравится

We'll assign a random number to each number i? Should these random numbers be distinct? Can we assign i itself to i?.

I'm not able to understand what's the point of assigning random numbers? Simply assigning i=i won't work?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится -6 Проголосовать: не нравится

    Do you realize how stupid it is to assign $$$ranval[i] = i$$$?

    Read the "Collisions" section carefully.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Oh thanks. I got really confused on that random number part and didn't read futher. Sorry. Now it makes sense.

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Good stuff. Learned recently how useful this technique can be from this problem

»
3 года назад, # |
  Проголосовать: нравится +43 Проголосовать: не нравится

Just a quick note, if you want to hash multisets instead of sets, just replace xor with addition modulo INT_MAX or LONGLONG_MAX (essentially, addition without caring about overflow). It is easy to see that the addition of two uniformly distributed random remainders modulo $$$M$$$ remains uniformly distributed modulo $$$M$$$.

One good example of the usefulness of this is the AHU algorithm for tree isomorphism. It's a lot easier and faster than keeping ugly maps from std::vector and what not. Also, if you use 64-bit hashes (which is very easy in this case, because you have no overflows to take care of), the probability of collision is about one in 4 billion.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    Very interesting, can you explain a bit more about collisions or recommend a link to some discussion or explanation. I thought about this, But I was scared by the collisions of sum.

    • »
      »
      »
      3 года назад, # ^ |
      Rev. 2   Проголосовать: нравится +19 Проголосовать: не нравится

      I think it's more easy if you modulo a big prime $$$p$$$. For any different multiset $$$A$$$ and $$$B$$$, consider the intersection of them is $$$C$$$, then $$$h(A)\equiv h(B)$$$ equal to $$$h(A-C)\equiv h(B-C)$$$. let $$$x$$$ be any element only contained in $$$A-C$$$, and $$$y$$$ is the times of occurence of $$$x$$$($$$y$$$ always smaller than $$$p$$$ and greater than $$$0$$$, so you can inverse it), then $$$h(x)\equiv \frac{1}{y}[h(B-C)-h(A-C-\left\{x\right\}\times y)]$$$, since $$$h(x)$$$ can be any number in $$$[1,p)$$$, so their collsion probability is $$$\frac{1}{p-1}$$$.

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by humbertoyusta (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by humbertoyusta (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

This problem can also be solved using this tricks.

»
23 месяца назад, # |
  Проголосовать: нравится +2 Проголосовать: не нравится

Another problem that can be solved with this technique : ABC-250 E

»
23 месяца назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

I saw this technique only once, at 1622F - Quadratic Set, one of my favorite problems

»
17 месяцев назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

1746F - Kazaee has a very similar technique in its core.

Except for it uses sum of random values instead of xor. A very nice feature of this kind of hashing is that it allows to check whether the number of occurrences of each element is divisible by $$$d$$$, by simply checking of the sum itself is divisible by $$$d$$$. Of course, you'd need to compute $$$O(\log C)$$$ sums with different random numbers to make an error probability is at most $$$C^{-1}$$$.

»
17 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Another problem using this trick (it's quite straightforward ): jolteon

»
17 месяцев назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

actually you can do it with sum operation

»
17 месяцев назад, # |
  Проголосовать: нравится +2 Проголосовать: не нравится

This blog deserves a position on catalog section.

»
3 месяца назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Another interesting problem Light Bulbs.