One day I thought of an interesting problem :

“ Given $$$n$$$ elements that are initially $$$0$$$, operate $$$k$$$ times, each time will randomly select an element, increase it by $$$1$$$, and find the expected value of the maximum value among the $$$n$$$ elements”

I used to think it was a very simple problem (at the beginning), but actually I couldn't find any good solution (at the beginning, I only knew an $$$O(k^2\ln k)$$$ solution)

After asking some powerful people, I learned one way to solve this problem in $$$O(\textrm{poly}~k^{\frac{5}{3}})/O(n\cdot k^{1.5})$$$. (thanks them.)

It is unimaginable to me that the solution of a seemingly simple problem is so difficult, Could any one find a better solution? Thank you !

Can you brief the solutions you have got till now?

Sorry, my English is really poor, it's hard for me to explain it.

let $$$F_z(x)=\sum_{i=0}^{z}\frac{x^i}{i!}$$$, we need to count $$$[x^k]F_z(x)^{n}$$$ for $$$z=0,1,2...k$$$

first, we can count $$$[x^k]F_z(x)^n$$$ in $$$O(k\log k)$$$

the main idea is if we know $$$F_z(x)$$$, we can get $$$[x^k]F_{z+j}(x)$$$ in $$$\mathcal O(j\frac{k^2}{z^2})$$$

to get $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution, we need to count $$$F_t(x)^n$$$ for $$$t = 1,2,3...cnt_1$$$, and then count $$$F_{i\cdot cnt_2+cnt1}(x)^n$$$ for each $$$i=0,1,2...\frac{k}{cnt_2}$$$, then use the $$$\mathcal O(j\frac{k^2}{z^2})$$$ algorithm to count this answer for each $$$[x^k]F_{z+j}(x)$$$

we can get an $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution by choose right $$$cnt_1,cnt_2$$$

This is the expected worst-case lookup time in a naively implemented hash table with uniformly random hash function.

The distribution of the variable under the expectation has been well-researched in the randomized algorithms literature. For $$$k = n$$$, most courses on the topic prove this value is $$$\frac{\log n}{\log \log n}(1 + o(1))$$$ with high probability.

For an exact expected value in the general $$$k \neq n$$$ case, see the following papers:

http://library.cust.edu.pk/ACM/Disk_1/text/2-6/ICDE88/P362.pdf

For the first document

Digression: if the elements in your problem were independent, controlling the maximum value would be easy. One way of doing this: introduce the variable $$$Y(t) = \text{number of elements greater or equal than } t$$$, and then $$$Y(t)$$$ is a sum of independent Bernoulli variables, and you can calculate $$$\mathbb P[Y(t) > 0]$$$ for every $$$t$$$.

Now, the elements obviously aren't independent. But it is less known that you can still use more or less the same bounds, because the summed variables exhibit

negative corellation.I can't really find a good online introduction to the topic, though. Maybe this is useful for advanced readers

Thank you very much!