OrangeFFT's blog

By OrangeFFT, history, 3 years ago, In English

Hey everyone!

I am working on a project where I need to implement some operations efficiently. Unfortunately, I couldn't find any problem similar to this one, so I'll describe it.

The problem

Assume we are given an $$$N \times N$$$ matrix containing only zeroes and ones. I'm interested in a way, possibly a data structure built over the matrix, that can answer queries and perform updates of these types:

  1. Q j: find the number of the first (bottom-most) row that has a $$$1$$$ on the $$$j$$$-th column of the matrix, or report that there are no 1s on that column ($$$1 \le j \le N$$$).
  2. U i: clear the $$$i$$$-th row, i.e. completely fill it with zeroes ($$$1 \le i \le N$$$).

Although I'd also be happy to know a more general way to do so, in case that helps in my particular scenario it is guaranteed that all the queries are given in a non-decreasing order of columns (i.e. when a query is made about the first position of a $$$1$$$ in the $$$j$$$-th column, no query will be made later referring to columns $$$1..j-1$$$, even though multiple consecutive queries can be made for the same column). Also, no updates will try to clear the same row twice.

Example

So, here's an example of a possible sequence of queries/updates over the following $$$4 \times 4$$$ matrix (rows are numbered from $$$1$$$ to $$$4$$$ bottom-up and columns are numbered from $$$1$$$ to $$$4$$$ from left to right):

\begin{array}{|c|c|c|c|} \hline 1 & 1 & 0 & 0 \cr \hline 0 & 1 & 0 & 0 \cr \hline 0 & 0 & 1 & 0 \cr \hline 0 & 1 & 1 & 1 \cr \hline \end{array}

  1. Q 1: the first query asks us to find the bottom-most row containing a $$$1$$$ in the first column, so we report $$$4$$$.
  2. U 4: the fourth row is cleared and so the new updated matrix is this:

\begin{array}{|c|c|c|c|} \hline 0 & 0 & 0 & 0 \cr \hline 0 & 1 & 0 & 0 \cr \hline 0 & 0 & 1 & 0 \cr \hline 0 & 1 & 1 & 1 \cr \hline \end{array}

  1. Q 2: the bottom-most $$$1$$$ on column $$$2$$$ is on row $$$1$$$, so we report $$$1$$$.
  2. U 1: row $$$1$$$ is cleared and the new updated matrix becomes:

\begin{array}{|c|c|c|c|} \hline 0 & 0 & 0 & 0 \cr \hline 0 & 1 & 0 & 0 \cr \hline 0 & 0 & 1 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline \end{array}

  1. Q 2: the bottom-most $$$1$$$ on column $$$2$$$ is on row $$$3$$$ now, so we report $$$3$$$.
  2. U 3: row $$$3$$$ gets erased:

\begin{array}{|c|c|c|c|} \hline 0 & 0 & 0 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline 0 & 0 & 1 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline \end{array}

  1. Q 3: we report $$$2$$$.
  2. U 2: the entire matrix becomes empty:

\begin{array}{|c|c|c|c|} \hline 0 & 0 & 0 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline 0 & 0 & 0 & 0 \cr \hline \end{array}

  1. Q 3: since there are no ones in column $$$3$$$, we report that fact.

Discussion

Can anyone think of a data structure that can be built in a time nearly proportional to the matrix size (something like between $$$\mathcal{O}(N^2)$$$ and $$$\mathcal{O}(N^2 \log^2 N)$$$ or so) that can support these operations online in $$$\mathcal{O}(\log N)$$$ or $$$\mathcal{O}(\log^2 N)$$$ per query/update?

I first thought of 2D segment trees/Fenwick trees, but the notion of 1D lazy propagation doesn't seem to extend to higher dimensions (although it is possible to perform range sum queries + range add updates or even support operations like $$$(+,+)$$$, $$$(\times, \times)$$$, $$$(\wedge, \wedge)$$$, $$$(\vee, \vee)$$$ in $$$\mathcal{O}(\log^2 N)$$$ per query/update). I thought of somehow implementing the notion of clearing a row using range sum updates and binary searching the first $$$1$$$ in a column with range sum queries, possibly maintaining multiple data structures at the same time for crossing results about ones and zeroes, but it seems a bit hard to model the problem using only range sums (but perhaps it is possible nonetheless).

Also, as far as I know a quad-tree takes $$$\mathcal{O}(N)$$$ time to clear an entire row (which is precisely the kind of operation required in this case).

Now, even though these operations may seem difficult to implement for the general case, this problem is very specific and has some interesting properties we could possibly take advantage of:

  • an update will simply clear an entire row (it seems just like "removing" it), we don't need to support range updates to arbitrary intervals.
  • given the presented assumptions, the number of updates is guaranteed to be $$$\mathcal{O}(N)$$$ given that no row will get cleared twice. Because of that, after all the updates are done we theoretically will have "visited" only $$$\mathcal{O}(N)$$$ ones in the matrix, one per each row, regardless of the density/sparsity of the original matrix (even if it started as completely filled with ones), so it seems that it may be possible to have something from $$$\mathcal{O}(N)$$$ to $$$\mathcal{O}(N \log^2 N)$$$ time for $$$N$$$ updates in total (instead of $$$\mathcal{O}(N^2)$$$ for $$$N$$$ updates in total).
  • a query will only refer to a specific column (and not to arbitrary subrectangles in the matrix) and, as said, the queries are made in non-decreasing order of columns.
  • we're only dealing with zeroes and ones, not with arbitrary integers, so for instance these queries/updates are all equivalent:

    • queries (enabling a binary search): range sum in column, range OR in column, range max in column, ...
    • updates: set entire row to $$$0$$$, multiply every element in row by $$$0$$$, perform an AND between each element in the row and $$$0$$$, update each element $$$x$$$ in the row to $$$\min(0,x)$$$, ...

Does anyone have an idea on how to solve this online in something close to $$$\mathcal{O}(\log N)$$$ or $$$\mathcal{O}(\log^2 N)$$$ per query/update (possibly amortized)? :)

Thank you!

Full text and comments »

  • Vote: I like it
  • +22
  • Vote: I do not like it