Блог пользователя Kinderr

Автор Kinderr, история, 7 лет назад, По-английски

The classical hashing way of hashing a number is keeping it as the reminder of the division with two distinct prime numbers i.e. hash(X) = (X mod prime1, X mod prime2), same thing with strings (considering them as polynomials). But modulo is a very heavy operation, and using a power of 2 (e.g. 2^64, using unsigned long long overflow) instead of a prime number fails when you hash a Thue-Morse sequence, so... Do you know any way of replacing the modulo with a smart bitwise operation or something else? I'd find it extremely useful :)

  • Проголосовать: нравится
  • +30
  • Проголосовать: не нравится

»
7 лет назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

You can use Mersenne primes https://en.wikipedia.org/wiki/Mersenne_prime for such purpose.

  • »
    »
    7 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

    Could you please explain how to find module fast for mersenne primes? OR operation on 2^31-1 is not right I guess because it gives mod on 2^31 not 2^31-1.

»
7 лет назад, # |
  Проголосовать: нравится +2 Проголосовать: не нравится

The problem comes from 'I know your polynomial hash arguments' not from 'your mod is not prime' so you need randomization. I haven't made a research for hashing like mod (2^64 — 1) with random a.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    No, you can shuffle the alphabet and they won't know enough. The mod shouldn't necessarily be prime, but if it's a power of two, when given a Thue-Sequence it will converge to something specific, no matter the character codes :<

»
7 лет назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

In my opinion, when the intended solution is to use hash, then the time limit will consider the "heaviness" of the mod operation so no need to worry.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    Indeed. But mod operations are generally problematic sometimes when the intended solution does not rely on hashing. Nonetheless, I too, for instance would like to know more about this if there is a way around.