TL;DR
Use the following template (C++20) for efficient and near-optimal binary search (in terms of number of queries) on floating point numbers.
It is also possible to extend this to breaking early when a custom closeness predicate is true (for example, min(absolute error, relative error) < 1e-9 and so on), but for the sake of simplicity, this template does not do so.
Introduction
It is well known that writing the condition for continuation of the binary search as anything like while (r - l > 1)
or while (l < r)
is bug-prone on floating point integers, so people tend to write a binary search by fixing the number of iterations. However, this is usually not the best method — you need $$$O(\log L)$$$ iterations where $$$L$$$ is the length of the range, and the range of floating point numbers is very large.
So, I came up with a way to binary search on (IEEE754, i.e., practically all implementations of) floating point numbers that takes bit_width(floating_type) + O(1)
calls to the predicate for binary search. I am pretty sure that this method has been explored before (feel free to drop references in the comments if you find them). Regardless of whether this is a novel algorithm or not, I wanted to share this since it teaches you a lot about floating points, and it also has the following advantages:
- No need to hard-code the number of iterations.
- It avoids issues that you get in other implementations (for example, using sqrt, you end up with a ton of cases with 0, negatives and so on).
- It is efficient (sqrt is expensive, and so is dealing with a lot of cases).
Arriving at the algorithm
I started out by thinking about this: floating point numbers have a very large range. Can we do better than $$$O(\log L)$$$ where $$$L$$$ is the length of the range? Recall that floating point numbers also have a fixed width that is much smaller than the log of the range they represent, and between two consecutive representable floating point numbers, it is their ratio that is nicely bounded (ignoring the boundary of 0 and inf and nans) instead of their difference. Now since it is well-known that for getting to a certain relative-error, it is optimal to use $$$\sqrt{lr}$$$ instead of $$$\frac{l + r}{2}$$$ in binary search.
So the first idea that comes to mind is to use sqrt instead of midpoint in the usual binary search. However, it has a lot of issues — for example, if $$$l = 0$$$, then you never progress in the binary search (this is fixable by doing a midpoint search on 0 to 1 and sqrt search on 1 to $$$r$$$). One more issue is how to deal with negatives (this is again doable by splitting the input range into multiple ranges where sqrt works) and with overflows/underflows. And the main issue with this approach is that sqrt is expensive — if you are doing a problem where the predicate is pretty fast and you need to do a lot of binary searches, most of your computation time would be due to sqrt.
Inspired by these, we decide to try to approximate sqrt in a way that preserves monotonicity. Here comes the main point — note that the IEEE754 implementation of floating point numbers separates the mantissa from the exponent, and the exponent part of $$$\sqrt{lr}$$$ is roughly the mean of that of $$$l$$$ and $$$r$$$. Since these are the top few bits (excluding the sign bit), we can just do the following (for positive floating point numbers at least): read the floating point representation as an integer (this can be done in a type-safe manner by using std::bit_cast
), take the midpoint of these integers, and convert it back to a floating point number. This clearly preserves monotonicity — this can be verified easily by hand. Note that this same thing works for when both range endpoints are negative numbers too — for this we simply invert the sign bit. The case where both are of opposite signs is also simple (if you disregard the fact that +0 and -0 are equal but have distinct representations and reciprocals) — check the predicate on $$$0$$$ (both the zeroes if it matters for your predicate).
Now if we want to do some more handling (infinities, NaNs, denormals), we can do it as we wish. In the implementation in the TL;DR above, we decide to replace NaNs with the infinity in the correct direction, and since there can be many infinities, we try to bring down the range endpoint to the largest representable floating point instead.
Analysis
In all, there are at most $$$w + O(1)$$$ calls to the predicate, where $$$w$$$ is the length of the complement of the longest common prefix of the floating point representations of endpoints.
This implementation is also robust to predicates which are noisy near the boundary (i.e., near the boundary, there is a small range where the true values of the predicate can come after the false values) — in this case the algorithm returns something in this range near the boundary.
Note that we did not need to hardcode the number of iterations, nor did we require some carefully-written predicate for loop termination.
Also, since std::bit_cast
is practically free compared to std::sqrt
, it is much faster in cases where you need to do a lot of binary searches.
As a usage example, this submission uses the usual binary search with fixed number of iterations, and this submission uses the template above.
Some problems
The following are some problems that use binary search on floating points, and should be solvable using this template. If you encounter any bugs in the template while solving these problems, do let me know in the comments below. Thanks to jeroenodb and PurpleCrayon for problem suggestions.