Tighter time complexity for GCD

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

(prologue) Some time ago this crossed my mind, but I only recalled it now and I think it could be worth a small blog post. This is nothing big and rarely useful but nevertheless, I found it interesting so hopefully you will too (don't expect to find this enriching).

It is widely known that the time complexity to compute the GCD (greatest common divisor) of two integers a, b, using the euclidean algorithm, is $\text{[math]}$ .

Short proof

This bound is nice and all, but we can provide a slightly tighter bound to the algorithm:

$\text{[math]}$

We show this bound by adding a few sentences to the above proof: once the smaller element becomes 0, we know that the larger element becomes the resulting gcd. Therefor we can first bound by $\text{[math]}$ , and lastly notice that we can change the maximum to minimum, since after one step of the algorithm the current maximum is the previous minimum; min(a, b) = max(b, a % b) when a ≥ b.

This bound is of course negligible... for a single gcd computation. It turns out to be somewhat useful when used multiple times in a row. I'll explain with an example "problem":

(1) Given an array A of n integers in range [1, M], compute their greatest common divisor.

The solution is of course, we start with the initial answer G = A₀, and iterate over all the remaining elements, assigning to G the value $\text{[math]}$ . The known time complexity analysis gives us the bound of $\text{[math]}$ , for computing gcd n times over values of order M. The tighter analysis actually gives a bound that is asymptotically better: $\text{[math]}$ (for practical values of M, you can refer to this as $\text{[math]}$ ). Why is that so? again, we can determine the time complexity more carefully:

The iteration over the array gives us the factor of n, while the remaining is recieved from gcd computations, which we analyze now; Let the value G_i be equal to G after the i-th iteration, that is:

G₀ = A₀
$\text{[math]}$

On the i-th iteration, the gcd computation starts with 2 values G_i - 1, A_i, and results with G_i, so the time complexity of it is $\text{[math]}$ , which is worstcase $\text{[math]}$ , so we will assume it's the latter.

The total gcd iterations (differing by a constant factor) is:

$\text{[math]}$

And generally speaking, this analysis sometimes allows us to show that the solution is quicker by a factor of $\text{[math]}$ .

As a last note, we can use (1) to show more such improvements. For example, (2): suppose the problem of gcd queries for ranges, and point updates. Of course, we solve this by a segment tree. The known analysis gives us a bound of $\text{[math]}$ per query or update. We can use (1) to give $\text{[math]}$ ; an update consists of a starting value, and repeatedly for $\text{[math]}$ steps we assign to it its gcd with some other value. Following (1), this takes the desired complexity. The same analysis is done for queries.

If the constraints were n, q ≤ 5·10⁵, A_i ≤ 10¹⁸, then this shows that instead of doing around 6·10⁸ operations, we do around 4·10⁷ operations (if we follow the big O notation and ignore the constant), which is close to the well known bitset optimization (factor of 1/32).

Thanks for reading :). As a question to the reader: What other tasks can utilize this, like (2) can?

Comments (3)

Show archived | Write comment?

filippos

5 years ago, # |

That's very cool Noam, thanks for sharing :)

→ Reply

misof

+46

Nicely written!

Note that the bound you gave also follows (in a much easier way, in my opinion) from the observation that the computation of gcd(a*d,b*d) has the same number of steps as the computation of gcd(a,b).

I_love_Tanya_Romanova

← Rev. 5 →

+24

In general what you wrote sounds nice and interesting, but I got a feeling that you are making things harder than they should be :)

Like misof already said, there is an easier way to see why the bound is true. I also don't think that the problem that you picked is the best way to make use of it (if we can make any use of it at all?..) In order to prove complexity there, to me it would be much more natural to observe that:

Out value of GCD can only decrease as we perform operations, and it can't increase.
For every element that you take, you are going to have at most one operation not affecting your GCD — when at the last step of the algorithm you have reminder equal to 0. You may also have one operation at the very beginning when reminder is "relatively close" to current value of GCD. Any other operation with non-zero reminder will decrease GCD at least by half, because GCD will be current x from your proof. (Upd. OK, my bad, I'm not careful enough here with what I'm taking as an x for which step, but at least you can easily show a bound like "O(1) bad operations, at most 2 good operations to decrease GCD by half").

If we'll call operations which don't change GCD at all or don't change it much "bad" and operations which change GCD by decreasing it by half or more "good", you can say that there will be O(N) "bad" operations (O(1) for every element) and there can't be more than O(log(M)) "good" operations (because that's how many times you can divide something by 2, and you need O(1) operations for one division).

I would also use same approach for your other examples. I agree that it is quite similar to what you wrote, only using words instead of playing around with numbers to get amortized analysis (for me numbers and formulas are scary :) ), but I don't see how either of these two connects to x * gcd(A, B) = gcd(A * x, B * x) that you wrote about at the beginning.

Noam527's blog