When you’re designing or writing software, one issue that can often be glossed over is the matter of efficiency. It’s so easy at the beginning of a project to just concentrate on getting something working, so you can demonstrate progress, and then worry about making it fast later on. The unfortunate fact is though optimisation can only take you so far, the true efficiency issues are going to lie in your algorithm design. Most IT professionals have learned the basics at some point in their career, but in case you’re a little rusty read on and we’ll refresh your memory.

The first thing to consider is what kind of complexity you’re looking to reduce. The two major complexity areas are time — that is, how long an operation will take to complete — and space, or how much memory is needed. When talking complexity, we tend to rate speed in terms of how many steps (or blocks of memory for space complexity) are taken per input variable, rather than in absolutes, since they are so dependent on the specifics of the hardware. Likewise, the length of time an individual step will take is largely disregarded since for large inputs this time will be dominated by the complexity class.

To make comparing two algorithms easier we group them into classes by using a special kind of notation. There are a number of different ways to do this, based upon the best case, average case and worst case input scenario. I like to use the worst case most of the time, since that’s the time it’s going to make the most difference to how you perceive performance. To express this we use what’s called big O notation, which expresses the number of steps an algorithm will take for an input of size “n” in the worst case. So, take the following example, which simply sums the numbers in a list.

1 2 3 4 5 6 7 8 |
sum(a) { final_sum = 0 n = length(a) for (i = 0; i < n; i++) { sum += a[i] } return final_sum } |

Treating each line as a single step, we can see that calling sum on a list of size n will take n+4 steps to complete, two for the initialisation of final_sum and n, one to set up the for loop, one for the return statement and then n times one for the loop body.

The problem has changed, and now you need to multiply each number by how many times it occurs in the list before adding it to the running total. Take the following implementation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
sum_multiple(a) { final_sum = 0 n = length(a) for (i = 0; i < n; i++) { num = 0 for (j = 0; j < n; j++) { if (a[j] == a[i]) { num++ } } final_sum += a[i] * num } return final_sum } |

This does similarly to the last function, with the exception that before adding the current value to the running total, it goes through the list and counts the number of occurrences of each value. Calling this function of a list of size n means that 4 + n * (1 + n * 2) steps are carried out since the outer loop now contains 2n + 1 steps. In total this means that calling this function “costs” 2n2 + n + 4 steps. For a list of 10 numbers it takes 214 steps, but for a list of 100 numbers it will need more than 20,000 steps to complete. That’s quite an increase. When we rewrite it in another way, however, this changes:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
sum_multiple2(a) { final_sum = 0 n = length(a) numbers = dict() for (i = 0; i < n; i++) { if (numbers.has(a[i])) { numbers[a[i]]++ } else { numbers[a[i]] = 1 } } for (j = 0; j < n; j++) { final_sum += a[j] * numbers[a[j]] } return final_sum } |

In this example we precompute the number of times each value occurs in the list. To do this we use a new data type which can store these values. It’s not particularly important how this is implemented so long as we can be sure that we can insert and retrieve values in constant time. In languages that support them as standard this could be a hash or a dictionary, or if you’re not that lucky (say you’re using C) then you can think of it as an integer array of size max(a). The method simply returns true if this type contains a the given value.

Anyhow, you can see how rather than work out how many times each number occurs as we reach it we can do it all at the beginning and store it. Let’s look at how this helps — sum_multiple2 takes 3n + 6 steps: the usual initialisation steps, plus two for each input to build the dictionary of number occurrences, and then one for each input to sum them. For 10 inputs this will take 36 steps, for one hundred: 306. That’s more than 65 times faster for the second version when dealing with 100 inputs. If say, we had one million inputs it becomes two trillion vs three million and the second version is more than 650,000 times faster.

Now we’ve been taking a fairly casual view of the number of steps in each algorithm, treating each line as one step, when a statement like “sum += a[j] * numbers[a[j]]” contains multiple lookups and could be compiled into as many as 10 individual instructions on a hardware level. This is not really that important though, when you think about it, even if we assume that every step we’ve counted in the second example really takes 10, and the first program is unchanged then it still represents more than a 60,000 times improvement.

Really what we’re interested in is the order of the algorithm, for convenience, we reduce it to the size of the largest part. For example, sum_multiples we say is O(n2) whereas sum_multiples2 is O(n). This is often all you really need to know, for large enough values of n, O(n) algorithms will always beat O(n2) algorithms, regardless of the details.