An Introduction to Big O Notation
12 Oct 2012
Lately I have been hearing the expression "big O notation" being thrown around a lot. I heard someone explain it roughly once by saying that it was how long your algorithm would take depending on how big your dataset was and how your algorithm is designed. I think that is a good jumping off point. Here are some other cool things to know about it. All of these concepts are backed up by hardcore mathematical expressions but unless you are a mathematician, these expressions can be difficult to understand. Luckily, there is a way to refer to the concepts of these expressions without having to know all of the details. Let's explore some of these ideas.
The first thing I think is important to know when talking about big O, you are really talking about the worst case scenario. We want to know in the worst case how long something would take. Let me give you an example. Let's say that you are trying to match a specific string in a list of strings, you do have a chance that the string that you need to find would be the first one in the list. We will assume the worst case scenario and pretend that the string will be the last in the list so that we can measure the length of time taken in the worst case scenario. Thinking about your data in a worst case scenario, and in terms of big O can help you think about the architecture of your solution to a problem. Let me give you an example from a problem I was working on.
I was tasked to write a method that would take in an array and return true or false based on it the array was unique or not. (No, I was not permitted to use the .uniq method) Here is how I implemented it....
1 def unique?(array) 2 unique = true 3 array.each_with_index do |compare_value, compare_index| 4 array.each_with_index do |current_value, current_index| 5 if current_value == compare_value && current_index != compare_index 6 unique = false 7 end 8 end 9 end 10 unique 11 end 12
Great. Well, turns out it is not so great. I found out that am using since overtime I add an element to the list I would have to go through the whole list again. Why is this bad? Well, You always want to try to to use the most efficient algorithm. Sometimes refactoring can be about this and less about code duplication. This is really interesting to me to be challenged on because I never thought about my programs like this before!!! Read on to find out about O(n2) and other types of big O notation.
O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set. There is no growth curve on this algorithm. It will always be a straight line.
O(n) describes an algorithm in where the performance will grow linearly and in direct proportion to the input data.
O(n2) this algorithm will grow exponentially for every piece of data added. it is proportional to the square of the data set size. This happens when you are processing each element of a set and that processing requires another pass through the set. You can think of a nested each loop like the example code above.
O(log n) is an algorithm that starts out at the highest point and cuts the the problem in half each time. If you double the amount of data that you are passing in, the time taken to process it will not double. It will increase at a constant rate but that rate will not be exponential. This is really cool because it basically works by taking target value and a list and picking a midway point. It will then check if that midway point is the value that you are looking for. If the target value is higher than the midway point value, it will choose the upper half of the list and continue to so the same process until it finds the value or can not break it down anymore. You can think of it as divide and conquer.
Pretty sweet, huh?