Hello everyone, I thought of this problem yesterday: Given a matrix of length N and width M, answer Q queries of this form:

- *x*_{1} *y*_{1} *x*_{2} *y*_{2}: Count the number of distinct elements in the submatrix with corners (*x*_{1}, *y*_{1}) and (*x*_{2}, *y*_{2}).

There is a version of this problem on Codechef, however the bounds for *A*_{i, j} are very small. Is there a way to solve this for e.g. *A*_{i, j} ≤ 10^{5}? I don't have any particular time complexity in mind.

UPD: It turns out that this problem is named "colored orthogonal range counting". It has been solved in "Further Results on Generalized Intersection Searching Problems: Counting, Reporting, and Dynamization" and "Efficient Colored Orthogonal Range Counting".

Thanks for sharing this interesting problem.

A modified version of the Summed-area table may be used to find the answer to each query rapidly.

The value of each element

`(x,y)`

in the modified version should be the set of distinct elements in the sub-matrix with corners`(1,1)`

and`(x,y)`

. The addition and subtraction operations in the expression for the sum of sub-regions in the table should be transformed to computing the number of distinct elements between the corners given in the query using the set union and set difference operations, respectively. Bitwise operators may be used to perform these operators if`bitset`

is used to represent the set of distinct elements.Best wishes

Your solution is wrong on several levels. First, even if we could use bitset to do what you suggested, it'd still be very slow. Second, the addition operation is doable, indeed, using bitsets, but the subtraction one is not. We could use a 2d-rmq with that bitset of yours so that we don't have subtraction anymore, but the precomputation complexity increases and the query complexity is still too heavy

Thanks for your comments.

RE: using bitset is still very slow.

I have extensively used c++ bitset STL to implement set operations before, and the approach proved to be much faster than the corresponding c++ set STL operations. The main reason for the improved performance is that no set insertion operations are needed when performing set operations such as union and intersection. Alternatively, these operations are performed in the bitset implementation using bitwise operators.

RE: The subtraction operation is not doable using bitsets

This is incorrect. This subtraction operation A — B in the original Summed-area table corresponds to the set difference between A and B. Using Boolean algebra, it is simple to show that the result can be computed using bitwise AND operation between the bitset representation of A and the bitset representation of the complement of B, which can be computed using a one's complement bitwise operation as well.

The complexity of the pre-computing the bitset matrix increases with the matrix size and the bitset size. However, the complexity of computing the answer of each query is independent of the matrix size. It only depends on the bitset size.

Best wishes

I don't know much about Boolean algebra. What I do know is that when trying to do the equivalent of A / B there is no way to achieve that using bitsets because A and B are actually multisets (not just sets) and when storing them as bitsets you lose information. If a certain element x is part of A and is also part of B, it could be either part of A / B or not (depending on whether it appeared strictly more times in A than in B). Take, for example, the vector matrices (1, 1) and (1, 2) and query on (1, 2), (1, 2) (that is the matrix formed from the last element). A = {1} in both cases and B is once {1} and once {1, 2}. A/B should contain {1} in the former and should not contain {1} in the latter. I wrote this message, even considering this approach is not going to lead us to a fast solution, just so that you and others that might think your solution works understand why I claim it doesn't (so with an informative purpose). If you're still not convinced I'm right just try to implement it and stress-test it

The aforementioned problem considers only distinct elements in the sub-matrix. Therefore, the multiset case is beyond the scope of this problem and the proposed bitset implementation. No information is lost when storing distinct elements of a set in its corresponding bitset. A bitset simply stores the set membership function of any given set, and returns whether an element (k) belongs to a set X as the Boolean value x[k].

For example, suppose a 5-element universal set U = {0,1,2,3,4}. The corresponding bitset representation of any subset of U is an instance of type bitset<5>. The bitset<5> x representation of the X = {1,2} corresponds to the bit-string "00110", where the right-most bit is the least-significant bit. In this example, x[ 0 ] = x[ 3 ] = x[ 4 ] = 0, and x[ 1 ] = x[ 2 ] = 1. Suppose also that Y = {1,4} whose bitset representation is "10010". Then, X — Y = {1,2} — {1,4} = {2}. Using bitwise operators, NOT Y = "01101", and X AND NOT Y = "00110" AND "01101" = "00100", which is equivalent to {2}.

More information about the bitwise operators of the C++ bitset STL are available here. bitset operators

Best wishes

Try coding the solution for the Codechef problem using your approach.

Thanks for the suggestion.

I've added it to my to-do list.

Best wishes

Thanks for clarifying why extending the Summed-area table approach does not work for this problem.

It remains true that bitset can be used to represent the set membership function efficiently, and that the set difference operation is doable using bitset operators.

Best wishes.

Interestingly enough, the CodeChef tutorial of the problem does use an extended version of the Summed-table approach to compute the number of distinct elements in a sub-matrix to answer each query.

CodeChef

I proposed the same approach without reading the tutorial.

Best wishes

Well, he didn't ask about solution of that problem...

That's right.

The only extension in the approach presented here, which turned out to be the same as the one presented in the tutorial, is using bitset to represent distinct elements in sub-matrix.

Best wishes.

Haven't you still figured that the bitset solution is wrong? Just make the difference between set and multiset and look again at the comment where I tried to explain you why it doesn't work. It's about frequencies of elements in a multiset (that you wrongly store as a bitset, which is a set, not multiset)

The last comments are about the Summed-table approach used in the problem tutorial, not about whether it is correct or wrong to use bitsets for counting distinct elements in the sub-matrix.

Thank you.

Auto comment: topic has been updated by f2lk6wf90d (previous revision, new revision, compare).