Memoization: an introduction using Python

October 31, 2014

Memoization is a classic optimization technique that can speed up a function by having it “remember” inputs it previously computed. If a memoized function is called with inputs it remembers, it returns the previously computed value. If the function is computationally expensive, and the function may be called multiple times with the same input, this simple technique can be an easy way to speed up a function without adjusting the underlying algorithm.

I consider this type of optimization “classic” because it involves two time-honored computer science topics: caching and time-space tradeoffs.

Caching is the process of saving copies of data “closer” to the mechanism that’s doing a computation or making a request for data. Caches are absolutely everywhere in technology:

  1. There’s a cache on your CPU right now (probably more than one) that is storing data from memory closer to the CPU, such that if the CPU requests the same memory locations repeatedly, the CPU can read from the cache instead of going to memory. Since the CPU’s cache can be tens or hundreds of times faster than memory, this allows your CPU to spend less time waiting for memory and more time doing basic operations like arithmetic on data, which are blazing fast.
  2. There’s probably at least one web cache somewhere in your network, such that your web browser’s request for a popular website is directed through that cache before going to the actual destination. That way, web content can be delivered to your browser much faster, since the cache is probably closer in terms of distance than the actual destination. This web cache also reduces the load on the Internet, making it faster for everyone.
  3. In fact, your web browser keeps a local cache for web pages, images, and scripts, so that it can load previously downloaded pages instantly. For example, if you click or tap your browser’s back button now, it’s probably going to read the web page from its local cache.

The concept of caching is simple: if you have reason to suspect that some pieces of data will be required more than once, you can store copies “closer” to the entity that needs it.

Time-space tradeoffs involve the general observation that algorithms can be optimized for maximum time efficiency (fastest speed) or maximum space efficiency (lowest memory usage), but usually not both. If you are working in an environment with particular constraints, it’s often possible to adjust an algorithm to save time by using more space, or save space by spending more time.

Memoization, however, is an approach to optimizing functions that doesn’t involve changing the function at all. It’s a higher-level technique akin to caching, though its origins can be traced back to time-space tradeoffs by way of computational complexity theory.

“Enough talk — show me the code!”

You asked for it:

def memoized(func):
    memo = {}

    def f(arg):
        if arg in memo:
            return memo[arg]
        rv = func(arg)
        memo[arg] = rv
        return rv

    return f

If this function looks simple, it’s because it is. We’ll see, however, that it won’t work for some functions. In particular, it won’t work for recursive functions, functions that take more than one argument, and functions whose arguments aren’t hashable.

However, for simple functions, this works, and presents the basic approach to memoization well. This memoized() function takes the function you want to memoize, and returns a wrapped up version of the function that has been memoized:

myfunc_faster = memoized(myfunc)

You can then call the myfunc_faster() function as you would normally. However, you’ll notice that, after you’ve call the memoized function once with a particular argument, subsequent calls with that argument come back faster.

How it works

The memoized() function creates a dictionary memo that will map values of arguments to the function func to the return values that func will produce with those arguments. (This is why arguments have to be hashable — they must be stored in this dictionary.)

Then the function defines another function, f(). If you’ve never seen a nested function definition before, it is done precisely like a normal function definition. However, this f() is particularly special because it’s an example of a closure. A closure is just a function that holds some data that was outside the function at the time of its definition. When the closure is called at some later time, it “remembers” the data, or in this case, “remembers” the reference to the dictionary memo. You might notice that, when the memoized() function returns, the memo variable goes away — but since the closure still has a reference to the dictionary, it doesn’t disappear forever. Instead, it can be used whenever the closure is called. Similarly, the closure retains a reference to the function that is supposed to be memoized. This allows us to “intercept” calls to func().

In the closure we first check whether the given argument arg is in the dictionary. If it is, the closure doesn’t have to call the func() at all — it just returns the previously computed value.

If the argument is not in the dictionary, func() is actually called, and the closure stores its result in the dictionary. Then the closure returns the value returned by func(). To the caller of the memoized function, it seems indistinguishable from calling the original function (except, perhaps, that the memoized version may be faster).

Memoization in action

Here’s an example of a function I wrote specifically to be as inefficient as possible — it’s an implementation of insertion sort that sorts a string. (If you’ve never heard of insertion sort, that’s okay — this is just one algorithm that takes unsorted data and returns the same data in sorted order.)

def swapped(str_, a, b):
    first, last = min(a, b), max(a, b)
    return str_[:first] + str_[last] + str_[first + 1:last] + \
           str_[first] + str_[last + 1:]

def insertion_sort(str_):
    if str_ == '':
        return ''

    if len(str_) == 1:
        return str_

    for i in range(1, len(str_)):
        _, m = min([(str_[j], j) for j in range(i, len(str_))])
        str_ = swapped(str_, i - 1, m)

    return str_

There are a couple of reasons why this insertion_sort() function is bad; I won’t go into detail here, but its helper function swapped() is very memory inefficient, which makes insertion_sort() slow. For the purposes of learning about memoization, imagine you did not or could not change the definition of this function to optimize it directly — instead, you’ll want to use memoization to speed up duplicate calls to insertion_sort():

sort_faster = memoized(insertion_sort)

Then I’ll come up with a very large string that chokes up insertion_sort():

letter_list = [chr(ord('a') + (i % 26)) for i in range(2**12)]
huge_str = ""
for c in letter_list:
    huge_str += c

If you copy this code into a Python file or drop it into a Python shell, you can execute the following call:

insertion_sort(huge_str)

On my MacBook, this call takes about three or four seconds to finish. The result is a large string of characters, all in order. When we run the memoized version for the first time, we’ll get the same performance:

sort_faster(huge_str)

However, if you run the sort_faster(huge_str) call again, you’ll find that it returns immediately with the result! This is because the closure checked the dictionary and found that the argument huge_str had already been computed, and, instead of calling insertion_sort(), it returned the data from the dictionary.

Conclusions and extensions

With memoization, we saw that we could optimize a function without changing the underlying algorithm by using a caching technique. In particular, we employed a closure (our function f() from above) that “intercepted” calls to an underlying function and checked whether the function’s return value was already known.

If the return value was already computed, we reduced the time taken to produce that value to the time it takes to access the data from the dictionary. Since Python dictionaries are very efficient, this resulted in a huge speedup, at least with my bad implementation of insertion sort.

If you’re looking for more complicated extensions for this, you might try getting this approach to work with functions that take more than one argument. (You might be able to collect them into a tuple and use that as a key to the dictionary.) You could also make this work with recursion and potentially speed up a naive recursive implementation of fibonacci(), the function that generates a particular Fibonacci number. (This is a classic application of memoization.)

I hope you enjoyed this short introduction to memoization, and don’t hesitate to ask me questions or send suggestions. I can be reached at abreen@bu.edu.