Maksim1744's blog

By Maksim1744, 9 days ago, In English

Some time ago I read this post about calculating prime-counting function in $$$O(n^{3/4})$$$ (you have to solve problem 10 to access the post). And in the end author mentioned that there is $$$O(n^{2/3})$$$ solution. To my surprise I was not able to find any more or less relevant information on the internet about that. Maybe there is a beautiful blog describing all of this on 10-th page of google, but I decided to collect information I found in this post.

Prerequisites

$$$\tilde O(n^{3/4})$$$

Let's start with $$$O(n^{3/4})$$$ approach from Project Euler. We will need some definitions:

  • $$$\pi(n)$$$ — prime counting function, i.e. number of primes which are not greater than $$$n$$$.
  • $$$S(n, a)$$$ — suppose we take all numbers from $$$1$$$ to $$$n$$$ and sieve them with first $$$a$$$ primes. Then $$$S(n, a)$$$ is the number of numbers which are preserved after this. In particular, $$$S(n, 0) = n$$$ and $$$S(n, \infty) = \pi(n) + 1$$$ (we always count $$$1$$$).
  • $$$p_a$$$ — $$$a$$$-th prime.

One can see that there is a recurrence relation for $$$S$$$: $$$S(n, a) = S(n, a - 1) - \left[S\left(\left\lfloor \frac{n}{p_a}\right\rfloor, a - 1\right) - a \right]$$$, since while sieving with $$$p_a$$$ we will mark all numbers which have $$$p_a$$$ as their smallest prime divisor, so we can divide them by $$$p_a$$$ and take value $$$S\left(\left\lfloor \frac{n}{p_a}\right\rfloor, a - 1\right)$$$. But now we are overcounting, because in this $$$S$$$ there are primes before $$$p_a$$$ (as well as number $$$1$$$), and we don't need to count them, so we are subtracting $$$a$$$.

Another fact is that $$$S(n, \pi(\left\lfloor \sqrt n\right\rfloor)) = S(n, \infty) = \pi(n) + 1$$$, since we don't need to sieve with primes greater than $$$\sqrt n$$$.

You can see that if we just do this recursion, every call to $$$S$$$ will be of the form $$$S\left(\left\lfloor \frac {n}{k} \right\rfloor , a\right)$$$ where $$$k$$$ is an integer. And there are at most $$$2\sqrt n$$$ possible values of $$$\left\lfloor \frac {n}{k} \right\rfloor$$$, because either $$$k \leqslant \sqrt n$$$, or $$$\frac nk \leqslant \sqrt n$$$, let $$$V$$$ be the set of all possible values.

All this leads to dynamic programming solution: let's keep array $$$dp[m]$$$ for all $$$m \in V$$$. Initially we will store $$$dp[m] = S(m, 0) = m$$$, and after $$$i$$$-th iteration of our algorithm we will want to store $$$S(m, i)$$$ in $$$dp$$$. To recalculate $$$dp$$$ in-place, go from largest $$$m$$$ to smallest and apply recursive formula from above. But now we need to do around $$$\sqrt n$$$ iterations, each time update array $$$dp$$$ of size $$$\sqrt n$$$, seems like $$$\Theta(n)$$$. However, we don't need to update $$$dp[m]$$$ if $$$m < p_i^2$$$, so we just stop current iteration as soon as we meet that condition.

With this we can finally calculate the complexity. Let's fix some integer $$$y$$$ around $$$n^{1/4}$$$ (we will calculate it later). For $$$p_i < y$$$ there are $$$\pi(y)$$$ iterations which take $$$O(n^{1/2}\pi(y))$$$ time, which is the same as $$$O\left(n^{1/2}\frac{y}{\log y}\right)$$$. For other part, let's first rewrite stopping condition a bit: \begin{gather*} m > p_i^2, \,\, m = \left\lfloor\frac{n}{k}\right\rfloor \Rightarrow k < \frac{n}{p_i^2} \end{gather*}

So we will complete $$$i$$$-th iteration in $$$O\left(\frac{n}{p_i^2}\right)$$$ time, and when we add all of this: \begin{gather*} \sum_{\substack{y < p_i \leqslant n \\ \text{$$$p_i$$$ is prime}}} \frac{n}{p_i^2} \leqslant \sum_{y < p_i \leqslant n} \frac{n}{p_i^2} = \sum_{p = y}^{n} \frac{n}{p^2} \approx \int_{x=y}^n \frac{n}{x^2} dx = -\frac{n}{x}\Big|_{y}^n = O\left(\frac{n}{y}\right) \end{gather*}

Resulting complexity will be $$$O\left(n^{1/2}\frac{y}{\log y} + \frac{n}{y}\right)$$$, which is $$$O\left(\frac{n^{3/4}}{\log^{1/2} n}\right)$$$ with $$$y = n^{1/4}\log^{1/2}n$$$.

Implementation

As a bonus, you get array $$$dp$$$ which gives you $$$\pi\left(\left\lfloor\frac nk \right\rfloor\right)$$$ for any $$$ k \geqslant 1$$$, which you can use later in some other dynamic programming on top of it.

$$$\tilde O(n^{2/3})$$$

I will use the same definitions, except that I will modify $$$S$$$ a little bit:

  • $$$\varphi(n, a)$$$ — number of integers in $$$[1;\,n]$$$ such that they are not divisible by any prime among first $$$a$$$ primes. The difference with $$$S$$$ is that $$$\varphi$$$ doesn't count primes (but still counts $$$1$$$).

For this function there is a similar recurrence relation: $$$\varphi(n, a) = \varphi(n, a - 1) - \varphi\left(\left\lfloor \frac{n}{p_a} \right\rfloor, a - 1\right)$$$.

Let's take some integer $$$\sqrt[3]{n} \leqslant y \leqslant \sqrt{n}$$$ (we will set it to $$$\sqrt[3]{n}\log^{?}n$$$ in the end). Let's write $$$\pi(n)$$$ as \begin{gather*} \pi(n) = \varphi(n, \pi(y)) + \pi(y) — F — 1 \end{gather*}

Here $$$F$$$ is the number of integers in $$$[1; n]$$$ which can be expressed as $$$pq$$$ for primes $$$p$$$ and $$$q$$$ with $$$p \geqslant q > y$$$.

To calculate $$$F$$$ we can use linear sieve up to $$$\frac{n}{y}$$$ to find all primes and then calculate $$$F$$$ using two pointers. Using sieve we can also calculate $$$\pi(y)$$$.

We will calculate $$$\varphi(n, \pi(y))$$$ using recursion. But let's stop recursion when we call $$$\varphi(m, a)$$$ with $$$m \leqslant \frac{n}{y}$$$ and deal with this later. Now every other recursion call has a form of $$$\varphi\left(\left\lfloor \frac{n}{k}\right \rfloor , a\right)$$$ for $$$a \leqslant \pi(y)$$$ and $$$k < y$$$. So there will be no more than $$$O(y \pi(y))$$$ calls of this type, which according to the distribution of primes is the same as $$$O\left(\frac{y^2}{\log y}\right) = O\left(\frac{y^2}{\log n}\right)$$$.

For the other part first notice, that we don't actually have to pass result back through the recursion. Every time we go deeper into the recursion, it's enough to maintain current sign. In other words, this means that for any call $$$\varphi(m, a)$$$ with $$$m \leqslant \frac ny$$$ we can just save it as triple $$$(m, a, sign)$$$ and then add $$$\varphi(m, a) \times sign$$$ to the answer. To calculate it, notice that with linear sieve we can get smallest prime divisor for each number up to $$$\frac{n}{y}$$$. So if we store these prime divisors in array $$$mn[n]$$$, $$$\varphi(m, a)$$$ will be equal to the number of indices $$$i \in [1;\,m]$$$ such that $$$mn[i] > p_a$$$ (here we assume that $$$mn[1] = +\infty$$$). This can be easily done offline in $$$O\left(\frac{n}{y} \log\frac{n}{y}\right) = O\left(\frac{n}{y} \log n\right)$$$ with fenwick tree by sorting triples by $$$m$$$ and doing standard sum-in-a-rectangle queries.

Total complexity is $$$O\left(\frac{y^2}{\log n} + \frac{n}{y} \log n\right)$$$, which is optimal for $$$y = n^{1/3}\log^{2/3} n$$$, resulting in a total complexity of $$$O(n^{2/3}\log^{1/3}n)$$$.

Implementation

Benchmarks

Both implementations are verified on Library-Checker

$$$n$$$ $$$10^{10}$$$ $$$10^{11}$$$ $$$10^{12}$$$ $$$10^{13}$$$
$$$\tilde O(n^{3/4})$$$ link 0.147s 0.763s 3.890s 20.553s*
$$$\tilde O(n^{2/3})$$$ link 0.045s 0.200s 0.914s 4.372s

* — this didn't fit in ideone limit, so I ran it locally and estimated time on ideone based on ratio on other tests.

Bonus — sum of primes

It's easy to see that you can find sum of primes up to $$$n$$$ using same ideas, or even sum of squares of primes, or sum of cubes, etc. For $$$\tilde O(n^{3/4})$$$ I already implemented it here, for $$$\tilde O(n^{2/3})$$$ this is left as an exercise for a reader :)

Practice problems

Thanks to sgtlaugh for the comment with practice problems!

P.S. If you notice any mistakes or typos, let me know

 
 
 
 
  • Vote: I like it
  • +346
  • Vote: I do not like it

»
8 days ago, # |
Rev. 2   Vote: I like it +23 Vote: I do not like it

Do you have any ideas for removing the log factor to get $$$O(n^{2/3})$$$ instead of $$$\tilde{O}(n^{2/3})$$$?

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +23 Vote: I do not like it

    There is this paper which describes $$$O\left(\frac{n^{2/3}}{\log^2 x}\right)$$$ algorithm, so it's definitely possible, but I'm not sure if there is an easier solution

»
8 days ago, # |
  Vote: I like it +31 Vote: I do not like it

This was discussed 5 years ago on Codeforces here (see the analysis of problem F and the comments) and on various related blogs (like here). You can find more by googling "site:codeforces.com counting primes".

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Wow, I had read a lot of blogs by searching on google as you described, but I never found this comment. Thanks for sharing!

»
8 days ago, # |
Rev. 2   Vote: I like it +37 Vote: I do not like it

Nicely written. Thanks for this article. There aren't many resources available on prime counting. Here is another paper that explains the approaches very well.

However, your implementation seems overly complicated. Here is a simple and fairly unoptimized implementation of Lehmer's prime counting. This runs twice as faster as your O(n2/3) implementation on CF's server, and around 1.5x faster on ideone, when N=1012. The idea can be improved and optimized and we can solve for N=1013 in under a second. There are faster algorithms, but this usually suffices for CP.

The trick is to partially memoize the phi function, which runs very fast in practice with appropriate constants.

»
8 days ago, # |
  Vote: I like it +6 Vote: I do not like it

One more thing, you should probably change your implementation while calculating prime sums to use __int128 to store the prime sums.

As we can see here, the sum of all primes will exceed the maximum value of a 64-bit integer when N becomes N=1011 or higher. The actual value of N for which it will fit in a 64-bit integer is somewhere between N=1010 and N=1011. You can binary search for it if you want :)

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    I though about that, but just changing it to __int128 will slow down the program for the cases where it is not needed. But I probably should make it a template parameter, so you can use some Mint as well

»
8 days ago, # |
Rev. 2   Vote: I like it +64 Vote: I do not like it

I recently discovered a different, more general, but slightly slower way to do prime counting. It relies on an analog of the Euler transform applied to Dirichlet series. The code runs in $$$O(N^{2/3})$$$ time without any log factors, which is fast but not optimal for prime counting (prime counting can be made even faster because primes only have density $$$1/(\ln N)$$$). I haven't seen it described anywhere, so here it goes.

Basics

First, let's go over some preliminaries. A Dirichlet series is the analog of a generating function for arithmetic functions. For an arithmetic function $$$a$$$, we define the Dirichlet series as the formal series

$$$ A(s) = \displaystyle\sum_{i=1}^{\infty} \frac{a(i)}{i^s} $$$

This definition is quite nice: if $$$h = f * g$$$ is the Dirichlet convolution of $$$f$$$ and $$$g$$$, then $$$H(s) = F(s) G(s)$$$. From now on, we will use the Dirichlet series and the function interchangeably.

Now, one common thing we will often want to do is evaluate the prefix sum of some arithmetic function, i.e. $$$\sum_{i=1}^{N} a(i)$$$ for some fixed $$$N$$$. Consider the set of values $$$\lfloor N / i \rfloor$$$ for each $$$i$$$. There are only $$$\approx 2 \sqrt{N}$$$ distinct values of this form, i.e. $$$i$$$ for each $$$1 \le i \le \sqrt{N}$$$, and $$$\lfloor N/i \rfloor$$$ for each $$$1 \le i \le \sqrt{N}$$$.

It turns out that, for many operations on Dirichlet series including addition, multiplication (Dirichlet convolution), reciprocal (inverse relative to Dirichlet convolution), etc., computing the prefix sums of the output up to each of $$$\lfloor N/i \rfloor$$$ only requires knowing the same prefix sums of the inputs. Thus, we will typically store exactly these $$$2 \sqrt{N}$$$ prefix sums for each intermediate Dirichlet series. For the rest of this post, "computing" a Dirichlet series will refer to computing these $$$2 \sqrt{N}$$$ prefix sums.

Using this representation, there are algorithms to multiply and divide Dirichlet series in $$$O(n^{2/3})$$$ time. I won't go into them here, but they just rely on the fact that there are not many distinct values of $$$\lfloor a / i \rfloor$$$ for any $$$a$$$. This will be an important primitive.

A Pseudo-Euler Transform

Now, we define a modified version of the Euler transform over arithmetic functions: given an arithmetic function $$$a$$$, the pseudo-Euler transform of $$$a$$$ is the arithmetic function with Dirichlet series given by

$$$ B(s) = \displaystyle \prod_{i=2}^\infty \frac{1}{1-a(i) i^{-s}} = \prod_{i=1}^\infty (1 + a(i)/i^s + a(i)^2/(i^2)^s + a(i)^3/(i^3)^s \cdots ) $$$

(Note that the traditional Euler transform would use $$$(1-i^{-s})^{a(i)}$$$ instead of $$$1-a(i)i^{-s}$$$; these are equivalent if $$$a(i) \in {0,1}$$$ and the latter is better defined when $$$a(i)$$$ is an element of an arbitrary ring, not necessarily integer.) We'll require that $$$a(1) = 0$$$ and $$$b(1) = 1$$$ to make everything well-defined when evaluating.

It's not too hard to verify that the pseudo-Euler transform of the prime indicator function $$$I_{\text{prime}}(s) = \sum_{p \text{ prime}} 1/p^s$$$ is exactly the one-function $$$\mathbf{1}(s) = \sum_{i=1}^\infty 1/i^s$$$ (this is also the Riemann zeta function). (There are many other nice relations of this form as well.) Thus, to evaluate prime counts, we just need to be able to evaluate the inverse pseudo-Euler transform quickly. Fortunately generating functions tricks can come to save the day.

Interlude: $$$\exp$$$ and $$$\log$$$

One crazy generating function trick is that $$$\exp$$$ and $$$\log$$$ are well defined for formal power series, and are still inverses. (For computing them over normal generating functions, see this.) Let's consider the formal-power-series definitions of $$$\log$$$ and $$$\exp$$$:

$$$ \log(1/(1-x)) = \sum_{i=1}^\infty x^i / i \iff \log(1+x) = \sum_{i=1}^\infty (-1)^i x^i/i $$$
$$$ \exp(x) = \sum_{i=0}^\infty x^i / i! $$$

We can actually verify that, under these definitions, if $$$x$$$ is a Dirichlet series with no constant term ($$$x(1) = 0$$$), $$$\log$$$ and $$$\exp$$$ are well defined and are still inverses. Furthermore, we can pretty much just plug a Dirichlet series into $$$x^0 / 0! + x^1 / 1! + \ldots$$$ to evaluate it! This also means that to compute the prefix sums of $$$\log$$$ or $$$\exp$$$ of a Dirichlet series, we only need the prefix sums of the inputs, just like for multiplication. If we care about prefix sums up to $$$N$$$, we only need the first $$$\log_2(N)$$$ terms, so we can do this in general in $$$O(N^{2/3} \log(N))$$$ time. We'll actually do better than this for our pseudo-Euler transform with some clever handling of small terms: if $$$a(i) = 0$$$ for $$$i < l$$$, we only need $$$\log(N)/\log(l)$$$ terms to compute $$$\exp(A)$$$ or $$$\log(A)$$$ up to $$$N$$$, so the runtime is $$$O(N^{2/3}\log(N)/\log(l))$$$.

Evaluating the pseudo-Euler transform

Then, if $$$B$$$ is the pseudo-Euler transform of $$$A$$$,

$$$ \log(B(s)) = \displaystyle\sum_{i=1}^\infty \log(1/(1-a(i)i^{-s})) = \sum_{k=1}^\infty \sum_{i=1}^\infty (a(i)i^{-s})^k/k = A(s) + \sum_{k=2}^\infty \sum_{i=1}^\infty (a(i)^k/k) (i^k)^{-s} $$$

Note that, if we only care about prefix sums up to $$$N$$$, the $$$k \ge 2$$$ terms contribute only $$$O(\sqrt{N})$$$ terms, each of which are $$$a(i)^k$$$ for some $$$i \le \sqrt{N}$$$. Thus, to compute the pseudo-Euler transform of $$$A$$$, we can first construct the prefix sums of the right hand side in $$$O(\sqrt{N})$$$, and then take $$$\exp$$$; we can take the inverse pseudo-Euler transform by taking $$$\log$$$ and then subtracting off the $$$a(i)^k / k$$$ terms for $$$k \ge 2$$$.

However, as we saw above, this takes $$$O(N^{2/3} \log_2(N))$$$ time because $$$a(2)$$$ could be nonzero. We can speed it up by using a trick specific to the pseudo-Euler transform. Let $$$\tilde{A}(s) = \displaystyle\sum_{i=N^{1/6}+1}^\infty a(i) / i^s$$$. Then, taking the pseudo-Euler transform of $$$\tilde{A}$$$ takes only $$$O(N^{2/3} \log(N) / \log(N^{1/6}+1)) = O(N^{2/3})$$$ time. Furthermore, if the result is $$$\tilde{B}(s)$$$, then

$$$ B(s) = \tilde{B}(s) \displaystyle\prod_{i=1}^{N^{1/6}} \frac{1}{1-a(i)i^{-s}} $$$

Then, dividing a Dirichlet series by $$$1-a(i)i^{-s}$$$ takes only $$$O(N^{1/2})$$$ time for each $$$i$$$ (you can check this easily), so we can compute $$$B(s)$$$ in $$$O(N^{2/3} + N^{1/2} \cdot N^{1/6}) = O(N^{2/3})$$$ time. The inverse process is symmetric; we first compute $$$\tilde{B}$$$ (go in increasing order of $$$i$$$ to get the correct values), and then take $$$\log$$$. Thus, we can compute the pseudo-Euler transform and the inverse pseudo-Euler transform in $$$O(N^{2/3})$$$ time each!

Final remarks

This pseudo-Euler transform seems quite powerful. For any totally multiplicative function $$$f$$$ such that $$$\sum_{i=1}^N f(i)$$$ is easy to compute, we can compute $$$\sum_{p \le N, p \text { prime}} f(p)$$$; this includes $$$f(i) = i^k$$$ for any $$$k$$$, or even things like

$$$ f(i) = \begin{cases} +1 & i \equiv 1 \pmod{4} \\ -1 & i \equiv 3 \pmod{4} \\ 0 & \text{else}\end{cases} $$$

This also gives a somewhat generic mechanism to evaluate $$$\sum_{i=1}^N f(i)$$$ for any multiplicative function knowing only the closed form at prime powers ($$$f(p^k)$$$): the closed form can easily be transformed into a closed form of the inverse pseudo-Euler transform of the target, we can evaluate that closed form, and then use the pseudo-Euler transform code to find the actual prefix sums.

One interesting property of using $$$\exp$$$ and $$$\log$$$ is that you need to be able to use fractions in intermediates for the $$$1/k!$$$. Because of our $$$N^{1/6}$$$ trick, we only need $$$5$$$ terms of $$$\exp$$$ and $$$\log$$$, so all denominators are factors of $$$120$$$, so they're at most a constant size. Then, if you want to compute prefix sums mod $$$m$$$, you may need to actually store intermediates mod $$$120m$$$. I don't know of a way to always do this kind of transform for arbitrary rings; I asked around and no one had a general answer.

I've implemented all of this here. I haven't done much benchmarking for actual speed; I think it's a little slow. There are algorithms for prime counting in $$$O(n^{2/3}/\log^2(x))$$$, but they tend to rely on structure specific to prime-counting: primes are sparse, so many intermediate values are $$$0$$$ and can be skipped.

  • »
    »
    8 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thank you for the comment! Probably it's even worth creating a separate post, but up to you.

    I've seen prime counting algorithms with $$$N^{1/6}$$$ involved, but they either didn't have a description at all, or had a japanese one, so I was too lazy to dig into them. Now mystery solved, I guess

  • »
    »
    8 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Cool!

    I tried testing it, unfortunately it's slow, as you said. Just with rationals it works in around $$$20$$$ seconds for $$$n=10^{10}$$$ and with some $$$120$$$-related optimizations it takes $$$3.6$$$ seconds (ideone doesn't compile it, so here is a codeforces submission)

    • »
      »
      »
      8 days ago, # ^ |
        Vote: I like it -28 Vote: I do not like it

      There is something beautiful in your code, template !!, I have seen many top coders from my university doing this, personally I never found this in any editorial code but somehow I find it sexy to have in code. Can you tell me what are its uses and how can I learn it.

»
7 days ago, # |
  Vote: I like it +12 Vote: I do not like it

Great blog! I have been long looking for someone detailing this more rigorously. Seems this thread provided everything that I was looking for.

To add more practice problem, you may try to solve Project Euler 245 if you want it to run under 1 second.

And as another bonus (I think it is worth mentioning), the totient summatory function can also be computed using the same ideas.

  • »
    »
    7 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    For totient summatory function there is a bit easier solution, which is described in Project Euler 351 and relies on the fact that $$$\sum_{m=1}^N \varphi(m) = \sum_{k=1}^N M\left(\left\lfloor \frac{N}{k}\right\rfloor\right) k$$$, where $$$M(n)$$$ is Mertens function, which can be computed using another fact which allows us to calculate $$$M(n)$$$ in $$$O(\sqrt n)$$$ using values of $$$M\left(\left\lfloor \frac{n}{k}\right\rfloor\right)$$$. And if you calculate $$$M(k)$$$ for $$$k \leqslant N^{2/3}$$$ with linear sieve, just applying this recursive formula for other $$$M(n)$$$ will result in a complexity $$$O\left(n^{2/3}\right)$$$, because \begin{gather*} \sum_{k=1}^{n^{1/3}} \sqrt{\frac{n}{k}} \approx \int_{x=1}^{n^{1/3}} \sqrt{\frac{n}{x}} dx = 2\sqrt{nx} \Big|_{1}^{n^{1/3}} = O\left(n^{2/3}\right) \end{gather*}

»
7 days ago, # |
  Vote: I like it +13 Vote: I do not like it

Thanks for making this blog, especially since now I don't need to make another blog for this question — on Yosupo Library Checker, there is this submission that runs in a blazing fast 23ms. It does not seem to be the same Meissel-Lehmer approach described in this blog or elsewhere. Can anyone explain it?