One of my Year II-level Computer Science courses is called Data Structures and Algorithms. In a somewhat rigorous and primarily mathematical way, we determine the optimality of different ways of solving the most common problems in computer science.
Not every method of accomplishing a task is the same, so it goes without saying that some will be better than others, for a given application. Because we spend a lot of time solving a few common but complex problems, general solutions are indispensable. But algorithms are like apples and oranges and it is difficult to directly compare them. Every day, we make simple comparisons between disparate objects by finding a common measure – perhaps the weight, or the size/volume.
In computer science, one such metric is Big Oh time complexity, which tells us the so-called “order” of a function or algorithm. In simple terms, Big Oh tells us the maximum number of basic operations (additions, subtractions, “print” statements) required to accomplish a task. Our class is concerned with the asymptotic properties of Order – that is, what happens when the parameter approaches very large (“infinite”) numbers – for large structures, for large sets of data. There is also Big Oh space complexity, which I won’t discuss here.
After years of rigorous work by mathematicians, software engineers and computer scientists alike, you might ask yourself, why is it important to understand algorithm analysis? After all, conceivably, there already exists an “optimal” solution to every problem, right?
But that’s only half of the picture. The definition of optimality depends largely on the domain you’re looking at, and this makes a lot of sense – your ability to use these general solutions depends on external constraints. For example, development on embedded systems poses problems for developers because they offer slower processor speeds and have more constrained memory.
However, the scientific method we use most often to study anything tells us that we must first hold all variables constant except the one under inspection. That means to really compare two algorithms properly, we must be inspecting them on the same dimension (time or space) and we must be ignoring the other factors.
The measure we focus on in Data Structures and Algorithms is the “worst-case time complexity” of algorithms, which is to say, the number of primitive operations that must be conducted to accomplish a given task in the worst case. The definition of what “worst case” is varies from application to application and requires a greater understanding of the algorithm in question.
If there is an algorithm which always requires T(n) = n + 5 primitive operations to complete, for different values of the input n, then the Order of T(n), O(T(n)) is O(n). Why? The strictly mathematical definition is a bit convoluted (and can be found elsewhere), but the essential idea is that Big Oh notation provides an upper bound of the amount of time an algorithm will take to complete in the worst conceivable case and for data sets approaching infinite size. It’s not difficult to see why the “5” constant term is dropped, then: for large data sets, the value of n becomes much larger than the constant shift factor “5”, so the constant becomes less and less significant.
Now we have a way of comparing algorithms. Unfortunately, while this measure of algorithms is simple to learn and good in a basic mathematical sense, it neglects the impact of different technologies on algorithmic performance. Some algorithms are specially designed by Computer Scientists to be optimized for some of these more complicated mechanisms, which is why benchmarking on your target systems is always important.
Modern processors come with all sorts of nifty additions, including technologies like hyper-threading, multi-core processors, extra L1/L2/L3 cache and different instruction sets like RISC or MISC. Working on understanding these requires an understanding of the physical properties of these hardware, and I hope to spend some time explaining these in greater depth for later articles.