### Maksim1744's blog

By Maksim1744, 11 days ago,

Some time ago I read this post about calculating prime-counting function in $O(n^{3/4})$ (you have to solve problem 10 to access the post). And in the end author mentioned that there is $O(n^{2/3})$ solution. To my surprise I was not able to find any more or less relevant information on the internet about that. Maybe there is a beautiful blog describing all of this on 10-th page of google, but I decided to collect information I found in this post.

### $\tilde O(n^{3/4})$

Let's start with $O(n^{3/4})$ approach from Project Euler. We will need some definitions:

• $\pi(n)$ — prime counting function, i.e. number of primes which are not greater than $n$.
• $S(n, a)$ — suppose we take all numbers from $1$ to $n$ and sieve them with first $a$ primes. Then $S(n, a)$ is the number of numbers which are preserved after this. In particular, $S(n, 0) = n$ and $S(n, \infty) = \pi(n) + 1$ (we always count $1$).
• $p_a$ — $a$-th prime.

One can see that there is a recurrence relation for $S$: $S(n, a) = S(n, a - 1) - \left[S\left(\left\lfloor \frac{n}{p_a}\right\rfloor, a - 1\right) - a \right]$, since while sieving with $p_a$ we will mark all numbers which have $p_a$ as their smallest prime divisor, so we can divide them by $p_a$ and take value $S\left(\left\lfloor \frac{n}{p_a}\right\rfloor, a - 1\right)$. But now we are overcounting, because in this $S$ there are primes before $p_a$ (as well as number $1$), and we don't need to count them, so we are subtracting $a$.

Another fact is that $S(n, \pi(\left\lfloor \sqrt n\right\rfloor)) = S(n, \infty) = \pi(n) + 1$, since we don't need to sieve with primes greater than $\sqrt n$.

You can see that if we just do this recursion, every call to $S$ will be of the form $S\left(\left\lfloor \frac {n}{k} \right\rfloor , a\right)$ where $k$ is an integer. And there are at most $2\sqrt n$ possible values of $\left\lfloor \frac {n}{k} \right\rfloor$, because either $k \leqslant \sqrt n$, or $\frac nk \leqslant \sqrt n$, let $V$ be the set of all possible values.

All this leads to dynamic programming solution: let's keep array $dp[m]$ for all $m \in V$. Initially we will store $dp[m] = S(m, 0) = m$, and after $i$-th iteration of our algorithm we will want to store $S(m, i)$ in $dp$. To recalculate $dp$ in-place, go from largest $m$ to smallest and apply recursive formula from above. But now we need to do around $\sqrt n$ iterations, each time update array $dp$ of size $\sqrt n$, seems like $\Theta(n)$. However, we don't need to update $dp[m]$ if $m < p_i^2$, so we just stop current iteration as soon as we meet that condition.

With this we can finally calculate the complexity. Let's fix some integer $y$ around $n^{1/4}$ (we will calculate it later). For $p_i < y$ there are $\pi(y)$ iterations which take $O(n^{1/2}\pi(y))$ time, which is the same as $O\left(n^{1/2}\frac{y}{\log y}\right)$. For other part, let's first rewrite stopping condition a bit: \begin{gather*} m > p_i^2, \,\, m = \left\lfloor\frac{n}{k}\right\rfloor \Rightarrow k < \frac{n}{p_i^2} \end{gather*}

So we will complete $i$-th iteration in $O\left(\frac{n}{p_i^2}\right)$ time, and when we add all of this: \begin{gather*} \sum_{\substack{y < p_i \leqslant n \\ \text{$p_i$ is prime}}} \frac{n}{p_i^2} \leqslant \sum_{y < p_i \leqslant n} \frac{n}{p_i^2} = \sum_{p = y}^{n} \frac{n}{p^2} \approx \int_{x=y}^n \frac{n}{x^2} dx = -\frac{n}{x}\Big|_{y}^n = O\left(\frac{n}{y}\right) \end{gather*}

Resulting complexity will be $O\left(n^{1/2}\frac{y}{\log y} + \frac{n}{y}\right)$, which is $O\left(\frac{n^{3/4}}{\log^{1/2} n}\right)$ with $y = n^{1/4}\log^{1/2}n$.

Implementation

As a bonus, you get array $dp$ which gives you $\pi\left(\left\lfloor\frac nk \right\rfloor\right)$ for any $k \geqslant 1$, which you can use later in some other dynamic programming on top of it.

### $\tilde O(n^{2/3})$

I will use the same definitions, except that I will modify $S$ a little bit:

• $\varphi(n, a)$ — number of integers in $[1;\,n]$ such that they are not divisible by any prime among first $a$ primes. The difference with $S$ is that $\varphi$ doesn't count primes (but still counts $1$).

For this function there is a similar recurrence relation: $\varphi(n, a) = \varphi(n, a - 1) - \varphi\left(\left\lfloor \frac{n}{p_a} \right\rfloor, a - 1\right)$.

Let's take some integer $\sqrt[3]{n} \leqslant y \leqslant \sqrt{n}$ (we will set it to $\sqrt[3]{n}\log^{?}n$ in the end). Let's write $\pi(n)$ as \begin{gather*} \pi(n) = \varphi(n, \pi(y)) + \pi(y) — F — 1 \end{gather*}

Here $F$ is the number of integers in $[1; n]$ which can be expressed as $pq$ for primes $p$ and $q$ with $p \geqslant q > y$.

To calculate $F$ we can use linear sieve up to $\frac{n}{y}$ to find all primes and then calculate $F$ using two pointers. Using sieve we can also calculate $\pi(y)$.

We will calculate $\varphi(n, \pi(y))$ using recursion. But let's stop recursion when we call $\varphi(m, a)$ with $m \leqslant \frac{n}{y}$ and deal with this later. Now every other recursion call has a form of $\varphi\left(\left\lfloor \frac{n}{k}\right \rfloor , a\right)$ for $a \leqslant \pi(y)$ and $k < y$. So there will be no more than $O(y \pi(y))$ calls of this type, which according to the distribution of primes is the same as $O\left(\frac{y^2}{\log y}\right) = O\left(\frac{y^2}{\log n}\right)$.

For the other part first notice, that we don't actually have to pass result back through the recursion. Every time we go deeper into the recursion, it's enough to maintain current sign. In other words, this means that for any call $\varphi(m, a)$ with $m \leqslant \frac ny$ we can just save it as triple $(m, a, sign)$ and then add $\varphi(m, a) \times sign$ to the answer. To calculate it, notice that with linear sieve we can get smallest prime divisor for each number up to $\frac{n}{y}$. So if we store these prime divisors in array $mn[n]$, $\varphi(m, a)$ will be equal to the number of indices $i \in [1;\,m]$ such that $mn[i] > p_a$ (here we assume that $mn[1] = +\infty$). This can be easily done offline in $O\left(\frac{n}{y} \log\frac{n}{y}\right) = O\left(\frac{n}{y} \log n\right)$ with fenwick tree by sorting triples by $m$ and doing standard sum-in-a-rectangle queries.

Total complexity is $O\left(\frac{y^2}{\log n} + \frac{n}{y} \log n\right)$, which is optimal for $y = n^{1/3}\log^{2/3} n$, resulting in a total complexity of $O(n^{2/3}\log^{1/3}n)$.

Implementation

### Benchmarks

Both implementations are verified on Library-Checker

$n$ $10^{10}$ $10^{11}$ $10^{12}$ $10^{13}$
$\tilde O(n^{3/4})$ link 0.147s 0.763s 3.890s 20.553s*
$\tilde O(n^{2/3})$ link 0.045s 0.200s 0.914s 4.372s

* — this didn't fit in ideone limit, so I ran it locally and estimated time on ideone based on ratio on other tests.

### Bonus — sum of primes

It's easy to see that you can find sum of primes up to $n$ using same ideas, or even sum of squares of primes, or sum of cubes, etc. For $\tilde O(n^{3/4})$ I already implemented it here, for $\tilde O(n^{2/3})$ this is left as an exercise for a reader :)

### Practice problems

Thanks to sgtlaugh for the comment with practice problems!

P.S. If you notice any mistakes or typos, let me know

• +346

By Maksim1744, 8 months ago,

Yesterday I saw this blog and thought that one bad editorial is better than zero good ones, right? So, enjoy it!

Here are links to contests: Div. 1, Div. 2

Div2A

Editorial
Code

Div2B

Editorial
Code

Div2C/Div1A

Editorial
Code

Div2D/Div1B

Editorial
Code

Div2E/Div1C

Editorial
Code

Div1D

Editorial
Code

Div1E

Editorial
Code

This is my first editorial (and first blog too), so any suggestions, improvements, etc are welcome.