Monday, March 25, 2013

Searching algorithms in C#

Let's just get this out of the way: we all know that in C# the BinarySearch is already implemented, so this is more of a didactic exercise than a real world application. Also, the C# implementation is most likely optimized to death, so the point of this article isn't to make it better, i'm just going to show you what it really does.
If you're only interested in searching something in a list and be done with it, you can stop after the first line of code.
If you just began your Computer Science course, read everything, because it will be useful to you (searching algorithms are important, just in case you didn't know).
The C# way of searching a sorted list/array is the following:
int index = array.BinarySearch(Element);
Simple, straightforward, fast.

Having that out of the way, what would be your first thought if i asked you to search a specific element in a list? You would probably think of a linear search. A linear search algorithm is pretty much this:
1 - Compare the 1 element to the search term -> it's the same? If yes, stop, if not, go on.
2 - Compare the 2 element to the search term -> it's the same? If yes, stop, if not, go on.
3 - Compare the 3 element to the search term -> it's the same? If yes, stop, if not, go on.
etc etc.
For this algorithm to work you don't need the list to be sorted, you're going to read it all until you find what you want anyway. And this is pretty much the only advantage it has over the binary search algorithm. Here's the C# implementation:
public static int linearSearch(int[] array, int searchTerm)
{
    int index = -1;
    for (int i = 0; i < array.Length - 1; i++)
    {
        if (array[i] == searchTerm)
        {
            index = i;
            break;
        }
    }
    return index;
}
This function returns the index of the found element, it's really easy to write, but it's inefficient. In the worst case scenario you need to cycle through the whole array to reach the end. In a big array with, let's say, 100000 elements you'd be doing 100000 iterations. That means the efficiency in the worst case scenario is n, where n is the number of elements in the array. The best case efficiency is 1, and the average is n/2.

So, can we improve the efficiency? Of course we can, the binary search algorithm is the solution.
I can't stress this enough, but the array/list NEEDS to be sorted for the binary search to work.

The binary search could be described as the following, using a dictionary as an example:
1 - You need to search a word in the dictionary, the word is "viewer".
2 - The first thing you do is opening the dictionary in the middle.
3 - "viewer" starts with the letter "v".
4 -  "v" is on the second half of your dictionary (first half is a - m, second half is n - z)
5 - Then you open the second half of the dictionary in the middle.
6 - Repeat until you find the word you're looking for.
Putting the algorithm in code is simple once you really understand it:
public static int binarySearch(int[] array, int searchTerm)
{
    int firstIndex = 0;
    int lastIndex = array.Length;

    //Return -1 if the searchTerm is bigger or smaller than the first or last array element
    //This check avoids unnecessary computation in some cases
    if (searchTerm > array[array.Length - 1] || searchTerm < array[0])
    {
        return -1;
    }

    while (firstIndex <= lastIndex)
    {
        //Here we split the array in two and we search only the half we need
        int middle = (firstIndex + lastIndex) / 2;
        //Second half
        if (searchTerm > array[middle])
        {
            firstIndex = middle + 1;
        }
        //First half
        else if (searchTerm < array[middle])
        {
            lastIndex = middle - 1;
        }
        //element found, return
        else
        {
            return middle;
        }
    }

    return -1; //We only get here if the element hasn't been found
}
It's worth noting that we are sorting an array of integers, because it's easier to understand, but you can adapt the method to a string easily, using:
string.Compare(a, b);
This is much faster than the linear search, in the worst case scenario (in an array of 100000 elements) it will do just 16 comparisons instead of the whole 100000, so the efficiency would be log2n.
But can we make it better? We could use recursion!
public static int binarySearch(int[] array, int searchTerm, int firstIndex, int lastIndex)
{
    //Element not found, exit failsafe
    if(lastIndex < firstIndex)
    {
        return -1; 
    }

    //Here we split the array in two and we search only the half we need
    int middle = (lastIndex + firstIndex) / 2;
    
    //Second half
    if (searchTerm > array[middle])
    {
        //Recursively call this method with the new indexes
        return binarySearch(array, searchTerm, middle + 1, lastIndex);
    }
    //First half
    else if (searchTerm < array[middle])
    {
        //Recursively call this method with the new indexes
        return binarySearch(array, searchTerm, firstIndex, middle - 1);
    }
    //Element found, return
    return middle; 
}
This isn't really better, since the algorithm is still the same, but it's a more elegant solution and it's probably the one that has most in common with the standard C# implementation.
That's it, hopefully you now know a bit more about searching algorithms.

No comments:

Post a Comment

prettyprint