Asymptotic analysis
Updated July 20, 2015
In introductory computer science courses, the focus is usually (in whole or in part) on teaching you the basics of programming. After you become proficient in a given programming language, your priorities may shift from learning how to write a program to how well your written program works. There are many measures of a program’s effectiveness (e.g., how quickly it runs or how much memory space it uses), and asymptotic analysis is a method to work with these measures at varying degrees of specificity.
Rather than reason about the efficiency of an entire program, which may contain various parts that run at different speeds or operate on different data items, we’ll often willfully forget the specifics of a complicated program and focus on one part of it. We will also ignore the details of a specific programming language.
Instead, we will reason about algorithms. An algorithm is a procedure of instructions that may be performed to solve a problem: given some input, an algorithm does some calculation with the input or transforms the input into useful output. Therefore algorithms are reusable operations that can be done on a potentially large possible number of inputs.
Take, for example, the method you learned in elementary school to add two multiple-digit numbers. This process can be considered an algorithm, and can be defined as follows:
First, align the two numbers so that their least-significant digits are on the right. Then, starting from the right, repeatedly add the two digits on top of each other, writing the sum underneath both numbers, in the appropriate column. If the sum of the digits exceeds ten, write only the ones digit of the sum underneath the numbers, and remember to add ten to the next sum.
Notice how this algorithm can be specified in English. Note also that the procedure is general enough for two numbers of any length (however, it is not written generally enough for numbers in any base).
Let the input size of the algorithm be the variable n. We’ll state that the largest of the two numbers being added has no more than n digits. If we can reason quantitatively and generally enough, we can prove the efficiency of this algorithm using only n and the structure of the algorithm. Then we can be guaranteed to have an idea of how well this algorithm works for any possible input, and we can compare it to other algorithms that may solve the same problem. As we will see, the trick will be to count the number of operations done by this algorithm and represent that number in terms of n.
We can put our algebra and calculus skills to work in analyzing an algorithm like the addition algorithm above. To do this, we will represent the running time of the algorithm as a function. Given the input size n, this function t(n) will give us the number of operations that the algorithm will execute in order to solve the problem.
For our addition example, we will say that a single addition of two digits of the addends is one operation. Due to the structure of the algorithm, we discover that the algorithm will perform n single additions, given two numbers with at most n digits, since it performs one addition per digit of each number.
Therefore we can say that our function t is precisely n, or t(n) = n. We now have a function that represents the time efficiency of our addition algorithm. Of course, this function t does not produce the time in seconds (also known as wall clock time), but instead in operations. This allows us to use t to describe the algorithm’s time efficiency without taking into account the specific computer or machine that would perform the algorithm. This is just fine, since most new computers are only a constant number times faster (in terms of wall clock time) than old computers, and taking those constants into account would prevent us from reasoning about just the algorithm, and not the algorithm running on a specific computer.
Now that we have a function that describes the time efficiency of our algorithm, we can come up with other functions for other algorithms and begin to classify them into groups. We would do this if we wanted to compare two different algorithms that were designed to solve the same problem. We may also to classify and compare algorithms that are useful or notable but do not solve the same problems. The latter case is comparable to creating a taxonomy of algorithms. The former case is useful when you are uncertain about which algorithm to pick if you know that speed of computation is important for your problem-solving setting.
We’ll partition all our running-time functions into classes. Each class has a different speed of growth. For functions that grow very quickly, we notice that those functions will surpass the speed of growth of fundamentally slow-growing functions. For example, the function g(x) = x² will surpass the function f(x) = x for all values of x larger than 1.
Value of x | Value of f(x) | Value of g(x) |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 2 | 4 |
3 | 3 | 9 |
4 | 4 | 16 |
… | … | … |
Notice that f will never catch up to g asymptotically (i.e., as x increases toward infinity).
So far, f and g have been very simple functions. I’ll propose that we define a new function, h(x) = x + 100. Our table of values gets a bit more interesting now:
Value of x | Value of f(x) | Value of g(x) | Value of h(x) |
---|---|---|---|
0 | 0 | 0 | 100 |
1 | 1 | 1 | 101 |
2 | 2 | 4 | 102 |
3 | 3 | 9 | 103 |
4 | 4 | 16 | 104 |
… | … | … | … |
It seems that h beats out g for the values of x that I wrote in the table. However, it doesn’t take long to realize that, for some value of x, g will surpass h. Here’s where they cross:
Value of x | Value of f(x) | Value of g(x) | Value of h(x) |
---|---|---|---|
… | … | … | … |
9 | 9 | 81 | 109 |
10 | 10 | 100 | 110 |
11 | 11 | 121 | 111 |
12 | 12 | 144 | 112 |
… | … | … | … |
Thus, the value of x for which g surpasses h is 11. In fact, anything else we try to do to f to make f grow faster than g will fail, except multiplying by some factor containing x or by using x as an exponent. (Try drawing up a table for h(x) = 2x if you don’t believe me.)
Let’s define a new function j(x) = x³. Since there exists an x such that j surpasses f, g, and h, we’ll say that j belongs to a different growth class.
Here’s a table of some common classes of functions, from slowest-growing to fastest-growing:
Name | Simple function |
---|---|
logarithmic | f(x) = log x |
linear | f(x) = x |
linearithmic | f(x) = x log x |
quadratic | f(x) = x² |
cubic | f(x) = x³ |
polynomial | f(x) = x^{n} (where n does not depend on x) |
exponential | f(x) = c^{x} (where c is constant) |
We can see that our function j belongs to the cubic growth class, and grows faster than any function from the classes above it in the table.
I’ll also note that it’s common to refer to an algorithm as having “linear” or “quadratic” performance, when what is really meant by the phrase is that the function describing the algorithm’s running time is linear or quadratic.
Now that we have classes of growth into which we can place our functions, we can come up with methods of comparing two functions. In particular, this method of comparison will formalize what we did earlier when we tried to design a function that grew faster than our function g and our efforts to draw up classes of growth.
We can use big O notation to express that some function f grows slower than or just as quickly as some other function g.
If f(x) = O(g(x)) then there exists constants c > 0 and x_{0} such that f(x) ≤ c·g(x) for all x ≥ x_{0}.
In the case of f(x) = x and g(x) = x², our constants could be c = 1 and x_{0} = 1. Note that when we let x = 1, the functions are equal (this is allowed by the definition above).
In the case of h(x) = x + 100, we can say that h(x) = O(f(x)), even though h starts “ahead” of f. For example, let c = 101, and choose any x_{0} ≥ 1. We may also let c = 2 and choose any x_{0} ≥ 100.
Informally, if f(x) = O(g(x)), f is “less than or equal to” g, for carefully chosen values of c and x_{0} and sufficiently large values of x.
There are other symbols we use to compare functions; some of them are used to describe “tighter” bounds than big O. Note that, due to the way big O is defined, it can only apply an “upper” bound to a function (in other words, it can only specify that some function is slower than or just as fast in growth as another function — it does not state that some function is at least as fast as another).
Thus, other bounds are possible (and useful). To state that a function f is at least as fast as g, we say that f = Ω(g).
The formal definitions of these other structures are included below, along with some informal descriptions.
Notation | Definition | Informal description |
---|---|---|
f = Θ(g) | g = O(f) and f = O(g) | f is bounded above and below by g |
f = Ω(g) | g = O(f) | g bounds f from below; f ≥ g |
f = o(g) | for all a > 0, f(x) ≤ a · g(x) | g grows much faster than f |
f = ω(g) | g = o(f) | f grows much faster than g |
We all learn best the things that we have discovered for ourselves.
— Donald Knuth, The Art of Computer Programming
When using the elementary school addition algorithm to add two n-digit numbers, how many individual one-digit additions will be made?
Suppose we add one n-digit number to another m-digit number, and suppose that n ≠ m. Come up with a new function representing the running time of the elementary school addition algorithm, using n and m.
Describe wall clock time and explain why it is not useful for the analysis of algorithms.
Identify the fastest-growing term in the function a(x) = 100x² + 2x³ + 998x, then classify the term using the table of growth classes above.
You overhear a colleague describe an implementation of an algorithm as being “cubic”. Explain what your colleage is trying to express, being sure to take into account the input size of the algorithm.
Suppose b(x) = x² + 100 and c(x) = 100x. Is the statement b(x) = O(c(x)) true? If it is true, choose particular values of c and x_{0} that allowed you to arrive at that conclusion.
Suppose d(x) = x³ + 3x² and e(x) = 2x³. Is the statement d(x) = O(e(x)) true? If it is true, choose particular values of c and x_{0} that allowed you to arrive at that conclusion.
You overhear another colleague claim that an implementation of an algorithm runs “in O(n) time”. If this colleague is to be believed, is it possible that your own implementation of the same algorithm can run twice as fast (in wall clock time) as your colleague’s and still be classified as O(n)?
You review the code and discover that your colleague’s algorithm seems to solve the problem in O(log(n)) time. Was your colleague’s initial claim wrong?