supersonic11's blog

By supersonic11, 4 weeks ago, In English

Suppose we are given $$$n$$$ integers (anywhere from $$$1$$$ to $$$1e5$$$). We are then asked query weather if a number $$$x$$$ was given in input. Fastest way to solve this is using frequency array say $$$a$$$. We can do $$$++a[x]$$$ during input and check if $$$a[x]!=0$$$ while answering the query. Each query here can be answered in $$$O(1)$$$.

Now suppose I am given $$$n$$$ pairs of integers instead, let's say $$$n=5$$$ (their value can be from $$$1$$$ to $$$1e5$$$) and input are $$$(1,2),(10,11),(14,14),(1000,1320),(456,321)$$$. Now I am asked if $$$(10,11)$$$ was in the input, what is the fastest way to do it ? We can use map of pairs and we can answer each query in $$$O(logn)$$$. I wonder if there is any $$$O(1)$$$ method ?

 
 
 
 
  • Vote: I like it
  • +15
  • Vote: I do not like it

»
4 weeks ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

You can "lock" one of the dimensions and the problem becomes the same as the 1d version.

This means, you can use a table whose index represents the value of the left value in your pair. Then for each index in the table, you keep a hashmap which stores the right value in your pair.

For example, for (1, 2), (1, 3), (1, 4), (2, 3), (4, 5)

Our table will look like the following:

0 -> ()
1 -> (2, 3, 4)
2 -> (3)
3 -> ()
4 -> (5)
5 -> ()

So, in order to perform lookup for a query $$$(L, R)$$$, we just need to access index $$$L$$$ of the table and query the hashmap in there for value $$$R$$$. Since both data structures support $$$\mathcal{O}(1)$$$ amortized lookup, you have an $$$\mathcal{O}(1)$$$ lookup operation.

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    In worst case it can lead to $$$O(n)$$$ and unordered_map won't work for sure since it can be hacked i think.

    • »
      »
      »
      4 weeks ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      There are ways to avoid being hacked when you are using HashMap/unordered_map. Also, it is not that easy to make HashMap/unordered_map exhibit $$$\mathcal{O}(n)$$$ behaviour (you really have to try to make a adversarial input). In practice, these data structures (more often than not) stay true to their advertisement of amortized $$$\mathcal{O}(1)$$$ lookup.

    • »
      »
      »
      4 weeks ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      If you dislike amortized guarantees, I have another solution that will fulfill your requirement of $$$\mathcal{O}(1)$$$ answer to queries albeit incurring a slight increase in your overall time complexity.

      If you are able to, sort both queries and input pairs by increasing order of left value then by increasing order of right value.

      Thereafter place a pointer to the queries and the input pairs. Let's call the pointer the the queries pointerQ and the pointer to the input pairs pointerI.

      Let the left and right values of the current query be valueQL and valueQR respectively.

      Let the left and right values of the current input pair be valueIL and valueIR respectively.

      If valueQL < valueIL, the answer for this query is NO (and pointerQ is incremented). Else if valueQL > valueIL, increment pointerI.

      Else valueQL == valueIL holds true, so we compare by right value as follows:

      If valueQR < valueIR, the answer for this query is NO (and pointerQ is incremented). Else if valueQR > valueIR, increment pointerI.

      The only case possible now is that valueQR == valueIR. So the answer to this query is YES (and pointerQ is incremented).

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    You can write your own hash for hashmap to store pairs.

»
4 weeks ago, # |
Rev. 5   Vote: I like it +19 Vote: I do not like it

Assuming the biggest number is $$$x$$$, you can represent all pairs of numbers in base $$$x+1$$$ e.g $$$(5,6) = 5*7+6$$$(in base 7 since biggest number is 6) etc. Then the problem degenerates to the 1 integer case. This works for triples and so on (e.g $$$(5,4,6) = 5*(x+1)*(x+1)+4*(x+1)+6$$$ ($$$x=6$$$ in this case)).

Also, $$$x$$$ has to be small enough to fit in array limits if you want to store the converted number in an array) otherwise unordered map(with custom hashing, refer here) has to be used instead of an array to get that $$$O(1)$$$ time complexity.

Not sure if this works directly for negative numbers tho(too noob at base conversion lol). For negative numbers, say the minimum negative number is $$$-y$$$, just add offset of $$$y$$$ to all numbers then apply base conversion then hash. To look for a pair, add the offset to the pair, convert to the right base, then search.

In the problem given, the max number is $$$1e5$$$, so its enough to represent every pair of integers in base $$$1e5+1$$$ where the first number is the tens digit and the second one is the units digit

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    In the problem given, the max number is $$$1e5$$$, so its enough to represent every pair of integers in base $$$1e5+1$$$ where the first number is the tens digit and the second one is the units digit

    counterexample: If input is $$$(1e5,1)$$$, then in base $$$1e5+1$$$ it would be $$$1e5*(1e5+1)+1$$$ i.e greater than $$$1e10$$$ and we can't store it in array. So I think your solution won't work for given constraints.

    • »
      »
      »
      4 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Also, x has to be small enough to fit in array limits if you want to store the converted number in an array) otherwise unordered map(with custom hashing, refer here) has to be used instead of an array to get that O(1) time complexity.

»
4 weeks ago, # |
  Vote: I like it +13 Vote: I do not like it

I'm not sure about the performance of hashmap, if you're going to test it on all possible combinations within the given constraints i.e. $$$[1, 10^{5}]$$$. I'm actually being serious about how to solve this in $$$O(1)$$$ time complexity. Use std :: bitset of size $$$10^{10}$$$.

Use the the following indexing idx = first.value * 10^5 + second.value

Also declare your bitset using this way, as many compilers don't allow direct declaration due to large memory allocation.

std::vector<std::bitset<1000000000UL>> wrapper(1); auto &hash = wrapper[0];

»
4 weeks ago, # |
  Vote: I like it -8 Vote: I do not like it

The easy method is parsing the pair into a string (like transforming {1,3} into "1///3") and using unordered_map on top of that. Hacks are a problem but you can actually scale that for bigger constraints unlike bitset.

The best way to do this if you dont mind being hacked is by hashing the number like this: multiply the first number by whatever the maximum bound is + 1 and then sum it with the second number. It will give you a unique number for each of them. Then, take that modulo a big prime number, but smaller than what you can allocate in memory. Then do a marcation array for that hash, and when you want to see if a pair is in the array, you check for the hash instead. If youre worried it may not pass the test cases because of collisions, do like 2 or 3 primes and/or use a random prime and it would be comically difficult to fail randomly

»
4 weeks ago, # |
  Vote: I like it +1 Vote: I do not like it

if hashmap works in $$$O(1)$$$ for you, you can hash pair $$$a,b$$$ as $$$2e5*a+b$$$.

»
4 weeks ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

A way to get O(1) worst case with a low constant, but with a certain error probability is to use a bloomfilter.

A bloomfilter is essentially a bitset and multiple (randomly chosen) hash functions. If you have a pair, you can calculate (for example) 3 hash functions, and set the bits at those positions to true. When checking if a pair is in the bitset, calculate all 3 hash functions of this pair, and check if all bits at these positions are set to true.

If you have an appropriate family of hash functions that hashes pairs of integers to integers from 0 to M-1 (Where M is the size of bitset), you can achieve a really low false positive error probability.

This approach is O(n) preprocessing and O(1) querying.