« back

The Proof is in the Profiling

26 Oct 2012

Forget about the pudding, let's look at the numbers.

As I was working with different sorting algorithms, Doug told me that he would like to see how they stacked up against each other and pointed me to the ruby-prof gem.

After installing the gem and going through the readme, I was starting to cobble something together. I set up a ProfilingHelper class where I initialized it with an instance of my Sorts class which contained all of my different sorting methods. Then I also initialized several arrays. Ten, twenty,fifty, one hundred, and one thousand each in ascending, random, and descending order. (I tried for ten thousand, but when I would run it, my machine would choke.) Still in initialize, I added an array of all of my sort methods as symbols, and an array of all of all of the arrays that I would use to run through my methods.

Next, I created a method called create_profiles so that I could iterate over all of my arrays and methods. It looks like this...

 1  
 2         def create_profiles
 3           @arrays.each do |array|
 4             result = RubyProf.profile do
 5               @methods.each do |name|
 6                 @sorts.send(name, array)
 7               end
 8             end
 9             File.open "profiling_data.html", 'a' do |file|
10               RubyProf::MyGraphHtmlPrinter.new(result).print(file) #note the MyGraphHtmlPrinter
11             end
12           end
13         end
14     

I then created a profile_generator file that required my profiling_helper. I created a new instance of my helper and then ran the create_profiles method.

Now, origially, I was using the default GraphHtmlPrinter. Once I viewed my report, I realized there was a little bit of a problem. The numerical precision I was looking for was not there. The time stats only showed two places past the decimal and in some cases, that was not enough for me to tell a difference. Doug suggested that I open up the gem and take a look. This was new for me and really fun. Once I was able to locate the area that I needed changed, I made my own class called MyGraphHtmlPrinter that had the same inheritance as his class, made my changes and TA-DA!, I was now looking at 6 places past the decimal. Maybe 6 is a little overkill but hey, I am still pretty pumped that I was able to do that.

So now, the stat info that you have all been so wound up for..... If you want to see the actual methods I used for these, see the posts on them.( I would suggest it for context.) Bubble Sort Quick Sort (random pivot and median pivot explained) In Place Quick Sort Merge Sort(known below as Merge Sort 2) Merge Sort 1 was not covered in a post. It looks like this...( The first part of the methods merge_sort_1 and merge_sort_2 are the same. The merge_1 and merge_2 methods are what's different.) The method below looks pretty terrible. Let's see if the merge sort 2 refactoring also pays off in speed.

 1  
 2     def merge_sort_1(array)
 3       return array if array.length < 2
 4       midpoint = array.length/2
 5       list_1 = array[0..(midpoint - 1)]
 6       list_2 = array[midpoint..(array.length - 1)]
 7       new_list_1 = merge_sort_1(list_1)
 8       new_list_2 = merge_sort_1(list_2)
 9       new_list = merge_1(new_list_1, new_list_2)
10     end
11 
12     def merge_1(array_1, array_2)
13        index_array_1 = 0
14        index_array_2 = 0
15        merged_list = []
16        while index_array_1 < array_1.length && index_array_2 < array_2.length
17          if array_1[index_array_1] <= array_2[index_array_2]
18            merged_list << array_1[index_array_1]
19            index_array_1 += 1
20          else
21            merged_list << array_2[index_array_2]
22            index_array_2 += 1
23          end
24        end
25        if (index_array_1 < array_1.length)
26            (index_array_1..array_1.length - 1).each do |value|
27              merged_list << array_1[value]
28            end
29        else
30            (index_array_2..array_2.length - 1).each do |value|
31              merged_list << array_2[value]
32            end
33        end
34         merged_list
35      end
36     

 1  
 2     Ten element arrays  * = the fastest,  - = the slowest
 3     Method                      ASC         RAND        DESC
 4     Bubble Sort              0.000023*    0.000226     0.000360-
 5     Quick Sort(rand pivot)   0.000249     0.000264     0.000184
 6     Quick Sort(median pivot) 0.000231     0.000257     0.000263
 7     In Place Quick Sort      0.000165     0.000176*    0.000183*
 8     Merge Sort 1             0.000250-    0.000250     0.000258
 9     Merge Sort 2             0.000316     0.000354-    0.000315
10     

 1  
 2     Twenty element arrays  * = the fastest  - = the slowest
 3     Method                      ASC         RAND        DESC
 4     Bubble Sort              0.000031*    0.000881-    0.001388-
 5     Quick Sort(rand pivot)   0.000584     0.000505     0.000536
 6     Quick Sort(median pivot) 0.000521     0.000521     0.000521
 7     In Place Quick Sort      0.000435     0.000433*    0.000435*
 8     Merge Sort 1             0.000597     0.000598     0.000598
 9     Merge Sort 2             0.000795-    0.000795     0.000799
10     

 1  
 2     Fifty element arrays  * = the fastest  - = the slowest
 3     Method                      ASC         RAND        DESC
 4     Bubble Sort              0.000050*    0.004822-    0.007912-
 5     Quick Sort(rand pivot)   0.001660     0.001702     0.001692
 6     Quick Sort(median pivot) 0.001546     0.002023     0.001546*
 7     In Place Quick Sort      0.001669     0.001589*    0.001643
 8     Merge Sort 1             0.001693     0.001658     0.001662
 9     Merge Sort 2             0.002551-    0.002424     0.002433
10     

 1  
 2     One Hundred element arrays  * = the fastest  - = the slowest
 3     Method                      ASC         RAND        DESC
 4     Bubble Sort              0.000085*    0.020044-    0.031424-
 5     Quick Sort(rand pivot)   0.003322     0.003342*    0.003751
 6     Quick Sort(median pivot) 0.003586     0.003532     0.003395*
 7     In Place Quick Sort      0.004661     0.004609     0.004707
 8     Merge Sort 1             0.003508     0.003549     0.003567
 9     Merge Sort 2             0.005496-    0.005446     0.005521
10     

 1  
 2     One Thousand element arrays  * = the fastest  - = the slowest
 3     Method                      ASC         RAND        DESC
 4     Bubble Sort              0.000716*    1.931140-    3.217454-
 5     Quick Sort(rand pivot)   0.050307     0.047650     0.047978
 6     Quick Sort(median pivot) 0.046475     0.040501*    0.040521*
 7     In Place Quick Sort      0.307545-    0.305809     0.322897
 8     Merge Sort 1             0.040940     0.045454     0.044338
 9     Merge Sort 2             0.073066     0.075060     0.072212
10     

Interesting data. So what have we been able to tell? Generally speaking,

 1  
 2     1. If your data is already sorted...
 3         a.use Bubble Sort!  Although, if you know that it is sorted, 
 4             why are you sorting it?!
 5         b.avoid merge sort for smaller sets and quick sort for larger sets.
 6     2.If your data is random...
 7         a.You should certainly go with a version of Quick sort.
 8         b.Stay away from bubble sort!
 9     3.If your data is on reverse order....
10         a.Quick sort is the clear winner.
11         b.Bubble sort is the worst.
12     

Some things that are a little bit less general...

1  
2     1. I can see why the arguments for using a random pivot and for using a median pivot
3         could become heated.  There is not a clear winner and a case for both sides.
4     

The most interesting case to me was the merge_sort_1 vs. merge_sort_2. Merge sort 1 was the clear winner. By a lot. I was NOT expecting this. This begs the question "Is refactoring to the shortest possible code always the best idea?" There are tweaks that could be made on merge_sort_1 to clean it up with keeping the basic idea but I wanted a contrast. Often times it seems that people think that because you can make something shorter or use slick, built in methods, it automatically makes your code better. That is clearly not true in this case. It is important to note that in hiding the details "under the hood", the cute, short, "convenience" methods can also be hiding a less convenient cost.

:D mind blowing

comments powered by Disqus