Expected size of a set consisting of pairwise AND from two random sets

Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
4 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
4 days
Register now »

*has extra registration

→ Streams

CodeChef Starters 131 Solution Discussion

By aryanc403

Before stream 16:05:59

View all →

→ Top rated

#	User	Rating
1	jiangly	3640
2	Benq	3593
3	tourist	3572
4	orzdevinwang	3561
5	cnnfls_csy	3539
6	ecnerwala	3534
7	Radewoosh	3532
8	gyh20	3447
9	Rebelz	3409
10	Geothermal	3408

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

hellman_'s blog

Expected size of a set consisting of pairwise AND from two random sets

By hellman_, 7 years ago, In English

Hello everyone!

Here's an interesting problem. Sounds easy at first but I could not find a solution that matches experimental results. Maybe I am missing some simple observation.

You are given two integers 1 ≤ l₁, l₂ ≤ 2ⁿ. Consider two random subsets of n-bit vectors $\text{[math]}$ of sizes l₁, l₂ respectively. Let

$\text{[math]}$ , where & is a bitwise AND.

Compute the expected size of R (in polynomial in n time). Computing maximum possible size of R is interesting too.

An easier version is when bitwise AND is replaced by bitwise XOR, it is also interesting.

NOTE: I am not sure that this problem has a solution.

bit manipulation, and, xor, probability, expectation

hellman_
7 years ago
13

Comments (13)

Write comment?

geniucos

7 years ago, # |

+18

It really sounds very interesting so I'm going to think about it for a while. Just for the record, are you sure it is solvable? (like have you got it frome somewhere or have you just asked yourself whether you can compute it or not?)

→ Reply

hellman_

7 years ago, # ^ |

No, I am not sure if it's solvable. This problem appeared in a research.

→ Reply

majk

7 years ago, # |

← Rev. 4 →

+13

Best I can do with the "xor" case is $\text{[math]}$ .

First, let's select a fixed n-bit vector r. There are exactly 2ⁿ pairs of {x, y}, such that $\text{[math]}$ . Note that all x_i (resp. y_i) are distinct. Define the following function: $\text{[math]}$ and observe that $\text{[math]}$ . In other words, to prevent r from being in R, we have to select whole S₂ outside of f(S₁).

There are l₁ elements in S₁ and l₂ elements in S₂. For any set S₁, there are exactly $\text{[math]}$ ways of choosing the set S₂, such that the above intersection is empty, and thus $\text{[math]}$ . There are exactly $\text{[math]}$ ways of selecting the set S₂.

Combined together, we have $\text{[math]}$ .

This yield an intermediate result, that (perhaps unsurprisingly), the expected size of R is 2ⁿ if l₁ + l₂ > 2ⁿ.

Otherwise, by linearity of expectation: $\text{[math]}$ . Now I would speculate that you can only calculate that in time $\text{[math]}$ (you can of course use Stirling's formula to get some upper and lower bounds faster).

→ Reply

majk

7 years ago, # ^ |

The "and" case is more complicated in the sense that the cardinality of set f(S₁) is not constant, and I don't yet see a simple way of expressing it, but maybe someone will.

→ Reply

bicsi

7 years ago, # ^ |

← Rev. 4 →

Ignore my post. I was wrong. :)

→ Reply

hellman_

7 years ago, # ^ |

+10

Linearity of expectation does not care about dependency actually.

I think majk meant l₁ + l₂ > 2ⁿ, becase in case of equality the numerator is equal to 1, not 0.

sage: n = 3; l1 = 4; l2 = 4; 1 - binomial(2**n-l1, l2)/binomial(2**n, l2)
69/70
sage: n = 3; l1 = 5; l2 = 4; 1 - binomial(2**n-l1, l2)/binomial(2**n, l2)
1

→ Reply

bicsi

7 years ago, # ^ |

I just checked that from wikipedia. You are right. Good to know linearity of expectation applies even when the variables are not independent.

→ Reply

majk

7 years ago, # ^ |

Thanks for correction hellman_, the inequality shall indeed be strict.

Exactly. You however need independence for linearity of variance.

→ Reply

bicsi

7 years ago, # |

← Rev. 3 →

+15

First of all, the expected size of the set can be stated as: $\text{[math]}$ (by linearity of expectation [please confirm]).

Let's now fix an element $\text{[math]}$ . Then $\text{[math]}$ .

Now, the real answer can be computed via dynamic programming in O(3ⁿ) or O(2ⁿ * n). I now conjecture that you can reduce this to polynomial in n by making the rather intuitive observation that the answer depends solely on the number of set bits of m.

→ Reply

hellman_

7 years ago, # ^ |

Could you elaborate on the DP part? Let n = 3. Then first we compute

$\text{[math]}$

How now to compute let's say $\text{[math]}$ ?

By your formula we can compute $\text{[math]}$

To get $\text{[math]}$ we also need to know $\text{[math]}$

(I am using the equation $\text{[math]}$ )

→ Reply

bicsi

7 years ago, # ^ |

← Rev. 2 →

Going from the computed values to the required ones might be harder using probability notion. The good thing is you can always transform it into a counting problem (and vice-versa), and then normalize it accordingly.

Seeing it as a counting problem, you can just use a simple inclusion-exclusion principle and, by induction from DP, all the values of greater masks will be correct.

→ Reply

geniucos

7 years ago, # ^ |

← Rev. 4 →

To reduce it to a polynomial problem you only need an easy observation: the probability for a number depends only on its number of 1s, letting you compute the dynamic programming in O(N). You would also need binomial coefficients to find out the answer which can be computed in O(N) as well so I guess that's all, isn't it?

PS: I consider the variant in which you have to compute the answer modulo a big prime number(like it's of the form P / Q and print P * Q ^ (-1)) because otherwise it still works but asking for precision for N a little bigger then 20 wouldn't work and with N small, you can backtracking for half of the values and it's much more interesting if you set N as 10^6

LE:The observation is right indeed, but the reccurence is not that simple actually so ignore most of what I said(except for the observation which should be relevant in any solution that uses those probabilities to compute the answer)

→ Reply

bicsi

7 years ago, # ^ |

Does the original problem ask to compute the answer modulo some prime? Because, if not, computing the combinatorial probabilities might be pretty painful without some Gaussian approximations. If so, then a rather small modulo would solve the problem of computing large binomial coefficients. (remember that binomials in this problem contain a (2ⁿ)! term).

I suppose some maths theorem would be helpful for a bigger prime...

→ Reply