O(1) runtime prime checking

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hello fellow people of Codeforces

I hope everyone doing well in these uncertain and unprecedented times. I wanted to share this cool C++20 trick that I learned recently.

Check out this cool implementation of is_prime(n) to test if n integer is prime in O(1) runtime (without any runtime precomputation!)

bool is_prime(int n);

#include <iostream>
#include <cstdlib>
#include <ctime>

const int MAXN = 100000;

template<int N>
struct Sieve {
    bool is_prime[N];
    constexpr Sieve() : is_prime() {
        for (int i = 2; i < N; i++) {
            is_prime[i] = true;
        }
        for (int i = 2; i < N; i++) if (is_prime[i]) {
            for (int j = 2 * i; j < N; j += i) {
                is_prime[j] = false;
            }
        }
    }
};

template<Sieve<MAXN> s>
struct SieveWrapper {
    static bool get(int n) {
        return s.is_prime[n];
    }
};

bool is_prime(int n) {
    return SieveWrapper<Sieve<MAXN>{}>::get(n);
}

/*
bool slow_is_prime(int n) {
    return (Sieve<MAXN>{}).is_prime_[n];
}
*/
    
int main() {
    auto t1 = time(NULL);
    
    // compute answer for 1000 random integers
    int ans = 0;
    for (int i = 0; i < 1000; i++) {
        ans += is_prime(rand() % MAXN);
    }

    auto t = time(NULL) - t1;

    std::cout << "Answer: " << t << ". Elapsed time: " << t << "s" << std::endl;
}

This code computes primes up to MAXN during compilation and stores them in a table right in the compiled binary. On a runtime is_prime(n) call, it queries the compiled table to find the result. Check this compiled assembly to find the sieve mask baked in the binary: https://godbolt.org/z/G6o84x.

This trick is achieved by passing the constructed Sieve as a template argument, which is always evaluated on compile-time. If you compile this code using older C++ standard, you will get the error:

error: a non-type template parameter cannot have type 'Sieve<MAXN>'

However, since C++20, non-type template parameter can have any LiteralType (note). In short, it is possible to compile-time compute an instance of any class with a constexpr constructor.

Here is a time test of compile-time vs run-time sieve computation:

main.cpp

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>

using namespace std;

const int MAXN = 100000;

template<int N>
struct Sieve {
    bool is_prime[N];
    constexpr Sieve() : is_prime() {
        for (int i = 2; i < N; i++) {
            is_prime[i] = true;
        }
        for (int i = 2; i < N; i++) if (is_prime[i]) {
            for (int j = 2 * i; j < N; j += i) {
                is_prime[j] = false;
            }
        }
    }
};

template<Sieve<MAXN> s>
struct SieveWrapper {
    static bool get(int n) {
        return s.is_prime[n];
    }
};

bool fast_is_prime(int n) {
    return SieveWrapper<Sieve<MAXN>{}>::get(n);
}


bool slow_is_prime(int n) {
    return (Sieve<MAXN>{}).is_prime[n];
}


const int reps = 1000;
int main() {
    vector<int> numbers(reps);
    for (int i = 0; i < reps; i++) {
        numbers[i] = rand() % MAXN;
    }

    int ans = 0;
    auto t1 = chrono::high_resolution_clock::now();
    for (auto &n : numbers) {
        ans += slow_is_prime(n);
    }
    auto t2 = chrono::high_resolution_clock::now();
    cout << "Runtime ans: " << ans << ". Elapsed (ms): " << chrono::duration_cast<chrono::milliseconds>(t2 - t1).count() << endl;

    ans = 0;
    t1 = chrono::high_resolution_clock::now();
    for (auto &n : numbers) {
        ans += fast_is_prime(n);
    }
    t2 = chrono::high_resolution_clock::now();
    cout << "Compiletime ans: " << ans << ". Elapsed (ms): " << chrono::duration_cast<chrono::milliseconds>(t2 - t1).count() << endl;
}

Compiled with GCC 9.3.0 using the command g++ -std=c++2a main.cpp. This is the execution output on my laptop:

Runtime ans: 90. Elapsed (ms): 1008
Compiletime ans: 90. Elapsed (ms): 0

Of course no one re-computes sieve 1000 times, this is done here for the sole purpose of showing that the algorithm is not computing sieve in run time 1000 times.

While such tricks won't help you cheat pass Codeforces problems, they might will help your submission stand out for fastest runtimes. Hope you found this interesting!

a@a:~/Desktop/Cp$ ./a.out <- sample count 1e6 Runtime ans: 86783. Elapsed (ms): 29 Compiletime ans: 86783. Elapsed (ms): 13 a@a:~/Desktop/Cp$ g++ test.cpp <- sample count 1e7 a@a:~/Desktop/Cp$ ./a.out Runtime ans: 877166. Elapsed (ms): 124 Compiletime ans: 877166. Elapsed (ms): 134

Comments (22)

Write comment?

xuanji

4 years ago, # |

Nice :) I wonder if you can do this in earlier versions of C++ by having just the is_prime array be compile-time computed (instead of the Sieve instance)

It might be worth it to try space-saving tricks like bit-packing and wheel sieve to increase max_N, the runtime penalty should be small, bit-packing should increase max_n by 8x, the simplest wheel sieve (omit all even numbers) should increase it further by 2x.

→ Reply

...._.._

← Rev. 2 →

+27

I think this can be done with C++14. Also you don't need a Wrapper class:

bool fast_is_prime(int n) {
    static constexpr Sieve<MAXN> s;
    return s.is_prime[n];
}

Runtime ans: 100. Elapsed (ms): 583
Compiletime ans: 100. Elapsed (ms): 0

See C++14 on ideone

tweety

4 years ago, # ^ |

+10

That's cool! I thought declaring something as constexpr only hints the compiler to compute it during compilation and I had to pass a constexpr constructor as a template argument to force it to be computed compile-time. But seems like declaring static constexpr works too

albjeno

constexpr on a function declaration is a hint that (under certain conditions) it may be evaluated at compile time and the result be stored in the read-only part of the binary. However, if you declare a "variable" constexpr the compiler is forced to compute the right-hand side of the assignment. You cannot change the value afterwards (constexpr implies const). If you want an actual variable you may use constinit from C++20. If you always want compile time evaluation of a function, you can declare it consteval with the latest standard. I'd say, these new keywords may be rarely of any use in competitive programming.

Qualified

Do you have more tricks up your sleeve? If so please make a series or something like that. I found this really helpful.

qmk

Why is it helpful?

Cause it interesting.

SecondThread

+63

I think he has a few more tricks up his sieve...

-6

Nice!

jianglyFans

Amazing, provide a new idea to cost compile-time instead of exec-time ! It may useful for particluar questions since CP only care about exec-time.

But: ‘constexpr’ loop iteration count has limit of 262144 (use ‘-fconstexpr-loop-limit=’ to increase the limit), and we can't use -fconstexpr-loop-limit= parameter in CP, sadly this ideal may can't put prime sieve in practice even ...._.._ provide C++14(So C++17 works) version of your idea.

abj

-27

https://www.cs.utexas.edu/users/misra/scannedPdf.dir/linearSieve.pdf the original paper

+14

Unless I’m misreading something, this post is about moving the computation into the compilation, not about how to actually write a prime sieve (which is what the paper is about).

sirearsh

code

#include <chrono>
#include <cstdlib>
#include <iostream>
#include <vector>
 
using namespace std;

const int MAXN = 262144;
 
template <int N>
struct Sieve {
    bool is_prime[N];
    constexpr Sieve() : is_prime() {
        for (int i = 2; i < N; i++) {
            is_prime[i] = true;
        }
        for (int i = 2; i < N; i++)
            if (is_prime[i]) {
                for (int j = 2 * i; j < N; j += i) {
                    is_prime[j] = false;
                }
            }
    }
};

struct sieve {
    bool ip[1000000];
    bool done[1000000] = {0};
    sieve () {
        ip[0] = ip[1] = false;
        ip[2] = true;
        done[0] = done[1] = true;
        for (int i = 2; i <= 1e6; i++) {
            if (done[i]) continue;
            ip[i] = true;
            done[i] = true;
            for (int j = i * 2; j <= 1e6; j += i) {
                ip[j] = false;
                done[j] = true;
            }
        }
    }
};
 
bool fast_is_prime(int n) {
    static constexpr Sieve<MAXN> s;
    return s.is_prime[n];
}
 
bool slow_is_prime(int n) {
    return (Sieve<MAXN>{}).is_prime[n];
}
 
const int reps = 10000000;
int main() {
    vector<int> numbers(reps);
    for (int i = 0; i < reps; i++) {
        numbers[i] = rand() % MAXN;
    }
 
    int ans = 0;
    auto t1 = chrono::high_resolution_clock::now();
    sieve st;
    for (auto &n : numbers) {
        ans += st.ip[n];
    }
    auto t2 = chrono::high_resolution_clock::now();
    cout << "Runtime ans: " << ans << ". Elapsed (ms): " << chrono::duration_cast<chrono::milliseconds>(t2 - t1).count() << endl;
 
    ans = 0;
    t1 = chrono::high_resolution_clock::now();
    for (auto &n : numbers) {
        ans += fast_is_prime(n);
    }
    t2 = chrono::high_resolution_clock::now();
    cout << "Compiletime ans: " << ans << ". Elapsed (ms): " << chrono::duration_cast<chrono::milliseconds>(t2 - t1).count() << endl;
}

in this code i added another simple sieve struct, and compared both

i don't think this is a satisfactory change ?

ffao

+26

Don't benchmark without optimizations turned on.

You can see i am new to all this, i also don't have much backgroud in this. but i am trying to learn. please explain me breifly what you meant.

Jakube

Compilers have different levels of optimizing code. If you add some flag like g++ -O2 or g++ -O3, then the compiler will try a lot harder to optimize the resulting machine code. Without flags the compiler will only do a very rough translation. That's easier to understand, if you want to debug the program, but the resulting program is not (very) fast.

Compiler people have spend (and still do) a lot of time thinking about how the compiler can optimize your code (while still giving you the correct result). Notice this will not turn a $$$O(n^2)$$$ algorithm into completely different $$$O(n \log n)$$$ algorithm. It's more like a lot of micro optimizations in the machine code. The rough translation has a lot of overhead. The compiler will try to remove that overhead, for instance by carefully rearranging the registers to avoid unnecessary copies, it will try to evaluate more stuff at compile time, it will try to rearrange the code to make it more efficient to run, eliminate dead code, unroll loops, enables SIMD, ... This can result in speedups from 1.1x up to 10x. Notice some programs can be optimized more, and some less.

Usually, when we run a final program, it is done in the optimized version. So does Codeforces and all other OJs. It doesn't make sense to benchmark the unoptimized versions of the program, when the thing that matters will be the optimized one instead.

Thank you. I did get the basic idea what is going on here.

neckborov

Try to swap sieve st and auto t1 = chrono::high_resolution_clock::now(). Make sure you know what you're really measuring.

Also you have UB in your sieve struct.

NotGil

Hi! Can someone explain what is meant by "While such tricks won't help you cheat pass Codeforces problems". How come this trick wouldn't help with CF submissions? Will the time spent compiling be added onto runtime or something?

joaom

I think that it's because MAXN can't be that big. If it's up to 300000, it usually don't make so much difference on execution time process the sieve on compilation or runtime.

TrungNT

It can help you to pass some problems, provided that your compiler is good and your judge does not have a compile time limit. However, for the sieve problem, the difference is only significant when MaxN >= 1e8, which will often crash the compiler, or if it doesn't, result in an executable too large that linking will fail.

For some other problems where MaxN = 1e4 and you found a O(n^2) solution with O(n) space complexity however, this trick may help you a lot.

1. This trick works in C++14, and needs neither a wrapper class nor a wrapper function:

Code

template<std::size_t N>
struct Sieve 
{
	private:
		bool is_prime[N];
	public:
		constexpr Sieve() : is_prime{0}
		{
			for (int i = 2; i < N; i++) 
			{
				is_prime[i] = true;
			}
			for (int i = 2; i < N; i++)
			{
				if (is_prime[i])
				{
					for (int j = 2 * i; j < N; j += i)
					{
						is_prime[j] = false;
					}
				}
			}
		}
		inline constexpr bool operator[](std::size_t index) const
		{
			return is_prime[index];
		}
};


int main()
{
	//Sieve
	constexpr int MaxN = 1e5;
	Sieve<MaxN> isPrime;

	assert(isPrime[5]);
	//More code here...
}

You can check the assembly here. Starting from line 106, many mov statements are copying bytes to the addresses that should hold prime numbers. Apparently, the constexpr constructor has forced all instances of Sieve to be built in compile time, and the overloading of operator[] allows simple array syntax on the Sieve.

2. This trick may be useful for MaxN <= 1e5, but not higher.

Although compile-time evaluation may theoretically make it possible to compute the Sieve for up to MaxN = 1e9, constexpr evaluation does have a large cost during compile time and also on the binary size. Therefore, setting MaxN to 1e7 or greater may cause the compiler to either crash or report an error, even if -fconstexpr-loop-limit= is set to a large number, while runtime building of a sieve of 1e7 elements is always possible.

tweety's blog

1. This trick works in C++14, and needs neither a wrapper class nor a wrapper function:

2. This trick may be useful for MaxN <= 1e5, but not higher.