When submitting a solution in C++, please select either C++14 (GCC 6-32) or C++17 (GCC 7-32) as your compiler. ×

STORMGATE078's blog

By STORMGATE078, history, 4 years ago, In English

After reading this, this and this, I think I understand the advantages and disadvantages of hash-based solutions to problems that involve matching strings or objects. But from what I've seen so far, almost all methods that are presented make use of the fact that the characters from the alphabet are allocated linearly (I.e.: val=ch-'a'+1;, where 'val' is the equivalent of the character in decimal format — used in calculating the hash values — , and ch is the char).

What happens if instead of using this linear method of assigning values I create a function that does that randomly (and that still makes use of a prime number and a base)? This function has to be injective and should assign an integer from the range $$$[1, base-1]$$$ (where $$$base$$$ is chosen at the beginning of the program) to each character from the given alphabet. For example, if α={'a', 'b', 'c'} and base=7, is the following distribution : f('a')=5, f('b')=1, f('c')=6, harder to hack, assuming that it has been chosen randomly at the beginning and the attacker can't reverse engineer it?

Presuming that the generated numbers are truly random (they make use of the chrono clock, the processor cycles or other external source of enthropy), is this approach actually reliable and harder to hack in general?

Thank you in advance!

  • Vote: I like it
  • +11
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by STORMGATE078 (previous revision, new revision, compare).

»
4 years ago, # |
  Vote: I like it +11 Vote: I do not like it

If you're randomizing properly I think you'll be safe with or without a special mapping function. If you're not randomizing properly then the birthday attack will still work since all it does is generate random strings and compare their hashes. This will probably make the other attacks harder unless your base is not much bigger than your alphabet size. In that case people can simply invert your function.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thank you very much! That means that this method works better when the base is way larger than the size of the alphabet(for example 5-6 times larger). In that case the birthday attack should take a lot more time to generate the hashes, right? Probably to the point that it is not usable anymore?

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      No, it won't affect the birthday attack at all other than a marginal slowdown from the time it takes to compute your permutation function. As explained in the article you linked to, the birthday attack simply generates a bunch of random strings and checks if the hashes of any two strings match. If you have $$$H$$$ possible hashes, the the average number of strings you need to generate in order to have two strings with the same hash is approximately $$$1.25\sqrt{H}$$$ regardless of how you generate the hash. This means that if your hash function isn't randomized and your hash values aren't more than 64 bits, you're for sure vulnerable to the birthday attack.

      Just randomize your hash function properly and you should be safe.

»
4 years ago, # |
  Vote: I like it +32 Vote: I do not like it

Just double hash and randomize base and you are safe