Algorithms: Binary Search

Binary search is typically one of the first algorithms taught to CS students. It is very powerful, prevalent, and yet one of the easier-to-understand algorithms. With binary search, even if you are searching through 1 million items, as long as they are sorted in some way—eg. numerically, alphabetically—it will take at most 20 steps to find an any item, regardless of its location!

The key insight is that since our list of items is sorted we can eliminate half of the items each time we make a guess. For example, imagine I’m thinking of a number between 1 and 100. You have to guess the number in as few tries as possible. For each guess, I’ll tell you if your guess is correct, too high, or too low.

The smart move is to guess somewhere in the middle on the first guess, say 50. If I say “higher” then you can immediately eliminate all numbers from 1-50. On the next guess if you say 75 and I say “lower”, then again you now know the correct number is between 51 and 72 and so you can again eliminate half of the potential numbers, 75-100. And so on. While you might get lucky and guess the actual number along the way, the maximum number of guesses this approach will take is 7. How do I know that?

We’re eliminating half of the potential numbers each time, which is \(log_2 n\). What’s a logarithm? It’s the inverse of an exponent. So if \(2^4\) = 16, then the \(log_2 16\) is 4.

What’s the \(log_2\) of 1 million? \(2^{19}\) is 524,288 and \(2^{20}\) is 1,048,576. So in a sorted list with 1,048,576 items, it will take at most 20 guesses with binary search to find the correct item.

Binary Search is an example of the divide and conquer strategy where given a large problem we don’t attack it directly. Instead we divide it into smaller problems, solve those, and then reassemble the whole problem to get the result. Often this is a much more efficient approach!

Implementation in Python

Let’s implement binary search in Python where data represents our sorted list of items and item is the specific item we’re searching for.

def binary_search(data, item):
    low = 0
    high = len(data) - 1

    while low <= high:
        middle = (low + high) // 2

        if item < data[middle]:
            high = middle - 1
        elif item > data[middle]:
            low = middle + 1
        else:
            return middle

    return -1

Test

>>> print(binary_search([1, 3, 5, 7, 9], 0))
-1
>>> print(binary_search([1, 3, 5, 7, 9], 7))
3

Time

To prove how fast this algorithm is, we can use Python’s time module to test how long it takes on our computer. This is what I get:

import time


def binary_search(data, item):
    start = time.time()

    low = 0
    high = len(data) - 1

    while low <= high:
        middle = (low + high) // 2

        if item < data[middle]:
            high = middle - 1
        elif item > data[middle]:
            low = middle + 1
        else:
            end = time.time()
            return middle, end-start

    return -1, end-start


>>> million_items = list(range(1000000))    

>>> print("Location %d required %10.7f seconds"%binary_search(million_items,999999))
Location 999999 required  0.0000379 seconds

Bug

Although this algorithm was invented in the mid 1950s, it wasn’t until 2006 that a potential bug was discovered in this classic implementation. Can you spot it?

Next Steps

If binary search is so powerful, why don’t we use it for everything? It turns out that in the real world we rarely encounter a perfectly sorted list of items. The cost of sorting a large number of items–which is required before we can run a binary search–takes a lot of extra work, so much so that it can be faster to simply do a sequential search, checking items one-by-one.

Along with search algorithms like Binary Search, there are many sorting algorithms that help us sort our data. Next up, learn about several sorting algorithms including Selection Sort, Insertion Sort, Merge Sort, and Quick Sort.

While you’re waiting, President Barack Obama visited Google in 2008 and was asked by then-CEO Eric Schmidt, “What is the most efficient way to sort a million 32-bit integers?”

Want to improve your Python? I have a list of recommended Python books.