I was solving this problem that needed to guess some expected value identity. I tried to prove it but couldn't.

There are $$$n$$$ pairs of numbers $$$(a,b)$$$. In one operation, you will choose one pair uniformly randomly. Let $$$(X_i,Y_i)$$$ denote the chosen pair in the $$$i^{th}$$$ operation.

Prove that $$$\large{E(\frac{\sum_{i=1}^{k}X_i}{\sum_{i=1}^{k}Y_i}) == \frac{E(\sum_{i=1}^{k}X_i)}{E(\sum_{i=1}^{k}Y_i)} == \frac{\sum_{j=1}^{n}a_j}{\sum_{j=1}^{n}b_j}}$$$ as $$$k$$$ tends to infinity.

I'm not sure about the former identity, but the latter is easily proven by linearity of expectation.

The problem seems to expect you to be able to figure out that

almost surely. The first equality is obvious, the second follows from continuity of division at $$$(E[X], E[Y])$$$, and the third is the strong law of large numbers.

This is not the same as the equality you are asking for help with proving. That one is also true, but I don't see a way to prove it that's easier than directly estimating the error in the denominator using one of several concentration inequalities on $$$\sum_{i=1}^k Y_i$$$.

EDIT2: There is an easier way, namely to use the continuity of division on the compact region $$$[1, 10^9] \times [1, 10^9]$$$ in which the sample averages can possibly vary and apply the weak law of large numbers to the vector $$$(X, Y)$$$. Of course, the appropriate modification of the spoilered argument below can be used to make this effective.

Proof sketchLet $$$\mu = E[Y]$$$. Find some finite constant $$$C$$$ such that for all $$$1 \leq y \leq 10^9$$$,

Plug in $$$y = \frac{1}{k} \sum_{i=1}^k Y_i$$$ and take expected values to get

It's then easy to take the desired limit.

EDIT: Whoops! We do not have that $$$X, Y$$$ are independent, so this conclusion isn't enough make the desired limit easy to verify. However, the same technique can be applied to both $$$X$$$ and $$$Y$$$ simultaneously. I leave the details as an exercise for the bored.

Thanks for your reply. However, I cannot really understand it.

I don't get how they are different. It appears you are assuming in the first thing that $$$X$$$ and $$$Y$$$ are independent, right? But in the problem, they are not.

I am not assuming that $$$X$$$ and $$$Y$$$ are independent in my first argument. For any particular $$$\omega$$$ in our sample space, $$$\sum_{i=1}^k X_i(\omega)$$$ and $$$\sum_{i=1}^k Y_i(\omega)$$$ are just real numbers, and sequences of those can be manipulated using the rules of ordinary calculus. The only step of the argument that uses probability at all is the last step, where I apply the strong law of large numbers separately to $$$X$$$ and $$$Y$$$ in the numerator and denominator respectively. But inter-dependence of $$$X$$$ and $$$Y$$$ has no effect on the applicability of the strong law of large numbers to one of them at a time!

Unfortunately, for some sequence of random variables $$$Z_k$$$, it is possible for $$$\lim_{k \to \infty} Z_k$$$ to almost surely converge to some number while $$$ E[Z_k] $$$ diverges, and it is also possible for $$$ E[Z_k] $$$ to converge to some number while $$$ \lim_{k \to \infty} Z_k$$$ almost surely diverges. Neither condition is stronger than the other, and the problem is asking about "$$$\lim \limits_{t \to \infty} \frac{g\left(t\right)}{t}$$$," not about expected values.

(EDIT: It happens that for uniformly $$$L^{\infty}$$$-bounded sequences of random variables, as is the case with $$$Z_k = \frac{\sum_{i=1}^k X_i}{\sum_{i=1}^k Y_i}$$$, the almost-sure convergence of $$$\lim_{k \to \infty} Z_k$$$ to some real number $$$v$$$

doesimply the convergence in expectation $$$\lim_{k \to \infty} E[Z_k] = v$$$. But the reverse implication is still not generally true!)Thanks for your replies.

Unfortunately, I cannot understand, I think there's a lot of heavy math that I'm missing. Maybe I should study the basics of real analysis first? Are there any specific resources you recommend so that I can understand this stuff easily? :D

The only text I've personally read that covers these aspects of probability theory satisfactorily was a set of lecture notes I don't think I can legally share, so I can't make any recommendations there. For basic real analysis, measure theory, and functional analysis (all of which help with understanding and proving results in probability theory), I got my initial grounding from the texts

Real and Complex Analysisby W. Rudin andMeasure Theoryby P. Halmos. But they are not light reading, and they expect mathematical maturity. Expect to struggle with digesting them for months if you are serious about spelunking on this route into the probability rabbit hole.Thank you! :)

It occurs to me also that Terry Tao's blog had some nice posts dealing with related topics, for example this one. Also on his blog are some lecture notes for an intermediate course on probability theory, which I have not yet read all of, but expect to be of high quality.

Just to complete the discussion: why not concentration inequalities? Hoeffding is the right tool for the job when facing sums of bounded random variables. It gives any version of the problem in question for free.

I have not thought out really well about it, though could there be some relation to the L'Hopitals rule while evaluating limits in calculus? I wonder.

Edit: Sorry, I guess I had a brain-fart. I am not sure what I was thinking.

Since it's a programming problem, you can proceed intuitively.

Let the number of times pair $$$i$$$ is picked be $$$c_i$$$; then the chance that we pick it is $$$\frac{k!}{\prod c_i!}$$$. It will be large if $$$c_i$$$ are all similar, small if they aren't — since factorials rise very quickly. That means states with very different $$$c_i$$$ will contribute little to the expected values. Done.