This blog is not about the binary GCD algorithm, but rather about exploring theoretical guarantees for an optimization on top of the standard Euclidean algorithm. The STL version is faster than both these algorithms, because it uses the binary GCD algorithm, but these algorithms are of interest from theoretical considerations.

In particular, my conjecture is that the second algorithm takes at most as many iterations as the first one, and if true, it'd be a pretty surprising claim, given how hard it is to bound Euclidean algorithm runtimes. So, it would be really cool if someone could prove this property.

UPD: thanks VArtem for affirming that this is the case and pointing to some relevant references.

The usual gcd algorithm looks something like this:

```
auto gcd = []<std::integral T>(T a, T b) -> T {
while (b) {
a = std::exchange(b, a % b);
}
return std::abs(a);
};
```

Consider the following implementation, that makes the second argument greedily smaller.

```
auto gcd2 = []<std::integral T>(T a, T b) -> T {
a = std::abs(a);
b = std::abs(b);
while (b) {
T r = a % b;
if (r > (b >> 1)) r = b - r;
a = std::exchange(b, r); // a = std::exchange(b, std::min(a % b, b - a % b); also works fine
}
return a;
};
```

On random data, `gcd2`

seems to be around 30% faster than `gcd`

on my machine, for `int32_t`

and `int64_t`

arguments.

And it doesn't seem immediately obvious, but on all inputs I brute-forced, it seems `gcd2`

always takes an equal or smaller number of iterations compared to `gcd`

. A proof is outlined in the comments.

Here's the benchmarking code I used:

**Benchmarking code**

And the timing results I get are:

```
190237720
gcd_32: 14054 ms
190237720
gcd2_32: 11064 ms
13318666
gcd_64: 47618 ms
13318666
gcd2_64: 36981 ms
```

When I compare this new algorithm with `std::gcd`

on similar data, there seems to be barely any slowdown with 32 bit ints, but the % operator makes it much slower on 64 bit ints.

Do let me know what you think about it in the comments, and a proof/counterexample would be appreciated!

I've come across this a few years ago (and proposed this idea to a local team competition). The keywords here are "least absolute remainder Euclidean algorithm" and this strategy is proven to be optimal among Euclidean-type algorithms ([1] mentions it, but I couldn't find Kronecker's original proof).

The worst-case inputs for this algorithm are Pell numbers, given by a recurrence $$$A_n = 2A_{n-1} + A_{n-2}$$$. They are asymptotically equal to $$$(1+\sqrt{2})^n \approxeq 2.4142^n$$$, which means that the number of iterations is about $$$\log_{2.4142} C$$$ (also briefly mentioned in [2]). The proof should be similar to the normal Fibonacci bound; I found one on ProofWiki, which looks okay to me at first glance.

[1]: A. W. Goodman & W. M. Zaring (1952) Euclid's Algorithm and the Least-Remainder Algorithm

[2]: Jeffrey Shallit (1990), On the worst case of three algorithms for computing the Jacobi symbol

Wow, thanks for the references! Didn't expect it to be optimal, or even just better than the standard algorithm in all cases. I'll try to find Kronecker's proof on my own as well, and will update here if I succeed.

So I dug up a couple of references — and Kronecker's original proof is presented in [3].

The summary of the proof is as follows:

[3]: Uspensky & Heaslet, Elementary Number Theory (chapter 3, section 2-4)