### diavolobia's blog

By diavolobia, 6 weeks ago,

So I had a regional contest yesterday, and it had a problem like this:

You are given N binary strings with the length of M ($N * M \leq 10^5$) and Q ($Q \leq 2*10^5$) queries. There are 2 types of query:

1 i j — Flip the $j_{th}$ bit at the $i_{th}$ string.
2 x y — Count the number of positions $i$ where the $i_{th}$ bit in the $x_{th}$ string and the $y_{th}$ string differ.

Sample input:

3 3
110
011
000
3
2 1 2
1 3 1
2 1 3

Sample output:

2
1

I wasn't able to come up with a proper solution so I decided to bruteforce instead. To make my code a little bit faster I converted all the strings to a long long vector, so I could handle query 1 with bit operations. For query 2, I noticed that if I use the XOR operation on 2 masks, the answer should have the bits on at positions which the 2 masks differ. Therefore I just had to use __builtin_popcount to count the 1 bits in the answer. Since each long long type has 64 bits, that reduced my vector size 64 times the size of the original string, thus the number of operations reduced by 64.

Theoretically my code was still $O(Q*M)$, just 64 times faster. I calculated the worst case, which was $3*10^8$, and I guaranteed that I would have some nasty timeout tests. But magically somehow, my code was accepted. I didn't believe it at first, thinking it was judged on pretests, but then I asked the organizers and they confirmed that the codes were judged on main tests.

How could this possible? Was my code actually accepted or it was just weak tests? And how can I solve this problem, the proper way?

• +25

 » 6 weeks ago, # |   +6 What was the time limit?$O(N^2)$ with bitsets and an otherwise good constant may very well pass. 3e8 is still about a second if you have a good constant.
•  » » 6 weeks ago, # ^ |   +1 The time limit was 2 seconds according to the PDF. Here's the image:Also can I ask for the (possibly) intended solution?
•  » » » 6 weeks ago, # ^ |   0 Honestly, I would not be surprised if this was the intended solution.
•  » » » » 6 weeks ago, # ^ | ← Rev. 3 →   +7 However it is also possible to solve the problem in $O(Q \min(N,M))$. If the number of strings is small, we need to only update $O(N)$ answers after every query of type 1. If the lengths are small, we can solve the problem completely naively. $\min(N,M) \le \sqrt{10^5}$, so it leads to a kind of sqrt-like complexity.
•  » » » » » 6 weeks ago, # ^ |   0 Thanks for sharing your solution! That's a nice approach I didn't realize.
 » 6 weeks ago, # |   +56 Modern processors can do about $3*10^9$ operations per second, but then you might ask why $O(n)$ with $n=10^9$ won't always get accepted? There is several reasons for that, most notably it is bottleneck caused with memory, branch misprediction and not all operations are equal (division will take longer than XOR). Let's look why your solution worked (I will ignore query of type 1 as its complexity is $O(1)$):In query of type 2 you only used XOR and __builtin_popcount (modern processors have built-in instructions for popcnt), both of these operations are extremely fast. Another factor is memory access, I guess you just used for loop to go over the vector: sequential memory access is the fastest and basically has no overhead as CPU successfully "predicts" future reads, thus it can cache them before they are used. And there is no branches to misspredict. So basically your code worked because it is actually good and optimized code.Optimizations are fun :P
 » 6 weeks ago, # |   +5 You can read the section 10.4 of CP Handbook to see why this algorithm got accepted. In the contest, i got accepted with same strategy but used bitset instead.If you want the solution without bit optimization then you can going to a solution with SQRT decomposition.If $n < m$, you can use a 2D array $ans$ that contain the solution of queries 2 for any $(i,j)$. Then you can update (query 1) in $O(n)$ and get the answer (query 2) in $O(1)$.If $n > m$, you can easily get the answer with brute force in $O(m)$ and update in $O(1)$.Note that $min(n,m) \le \sqrt{nm}$, so the complexity of this algorithm will be $O(nm\sqrt{nm})$. As i remember, in the contest, there is a constestant got accepted with this solution
•  » » 6 weeks ago, # ^ |   -11 orz
•  » » 6 weeks ago, # ^ |   -11 OTZ