Searching and Sorting Arrays

Question

Searching and Sorting Arrays

An array is not just used to store elements but also access those stored values later on. We may also need to search for a particular element in the array. Sorting the array may also be required at times. We will now look into how these things are done using different algorithms. And we shall also see what the big O notation is and use it to determine the efficiency of these algorithms.

Contents

Simple Search or Linear Search
Binary Search
Bubble Sort

Simple Search or Linear Search

The simplest way to search for an element is to iterate through the entire array and compare each value of that array with the target element. When a match is found, we may break out of the loop if we are sure that there exist no duplicate values. However, if duplicate values do exist in the array, we might have to continue searching even when a match is found. Since we search for the elements in a linear order from left to right ( by convention, you may also search from the end of the array to the beginning), this algorithm is named as linear search. Following is the code snippet for linear search.

int[] a = { 3, 34, 5,91, 100}; // an array not containing duplicates
int target = 91; // the element to be searched
for( int i=0; i<a.length; i++) {
    if(a[i]==target) {
        System.out.println ( “Element found at index “+i);
        break; // break should be omitted if the array contains duplicates
    }
}

Before we move onto the next algorithm, we will look at what efficiency means in the context of algorithms. There are two things that have to be taken care of when we write algorithms. The first is the execution time and the second is the memory requirement. Execution time is determined by the numbers of statements that are excited by the algorithm while memory requirement is determined by the additional variables that we use. In the above program, we have used only one variable, target, which falls under memory requirement. However, the execution time cannot be specified directly as such even if we consider every statement to take the same time to execute. The reason is that, in this particular algorithm, if the target is in the beginning of the array, then this search would require only a single comparison which is known as the best case. However if the target is located at the end of the array, the number of comparisons required would be equal to the length of the array. This is the worst case. The average execution time would occur when the target is located in the middle of the array. So, one thing that we can conclude is that the efficiency of algorithms depends on the input data too. Here arises the need for a standardised comparison of efficiencies. And one solution is the Big O notation.

The big O notation is gives us a relation between the number of data items contained in the array and the number of comparisons required. It takes the worst case into account. It determines how hard an algorithm has to work to obtain the result. In this particular linear search algorithm, if the number of data items are n, then the number of compressions required are also n. This is the order of the algorithm. Hence, linear search is said to be an algorithm of order n.

There is a better way to arrive on this result by using the formal definition of big O. A function f(n) is said to have an order g(n) written as O(f(n))=g(n) [ O represents order ] if and only if, there exists an N and a c such that for every n>N, the following condition is true: 0 < f(n) < c* g(n). What this definition has basically done is to put an upper bound on the performance of the algorithm and take the worst case scenario.

Let us understand it in the context of linear search. We should first develop the function f(n). Assume that each of the statements takes the same time to execute. So the execution time is proportional to the number of statements executed which will be equal to 2*n or n or 2*n+1 depending on what you wish to regard as a statement. The important thing here is that the power of n is 1 and not 2 or 3. This basically depends on the loop which is executed n times (the size of the array). In the above definition, if the f(n) is substituted, you would get g(n) as n by taking appropriate values of c and N. This might not be much clear for the present moment. But for now, assume that order of an algorithm is the number of times the for loop is executed for the array size, n. In this particular case, when the array contains n elements, the for loop has to execute n times (considering the worst case scenario) and hence the order of linear search is n.

Binary Search

Binary search is an efficient algorithm which can be used to search in a sorted array. Note that the array has to be sorted in either ascending or descending order for the algorithm to worth. Initially, the range of the array to be searched begins at the first element of the array and extends up to the last element. This range reduces by half in every iteration. We locate the middle element of the array and compare it with the target. If the target equals the middle element, our search is completed, otherwise the range has to be adjusted accordingly. For now, assume that the array is sorted in ascending order. If the middle element is smaller than the target, then the target cannot be found in the left half of the array as each of those elements would also be smaller than the target. Hence, we can narrow down our search to the right half of the array. On the other hand, if the middle element is larger than the target, we narrow our search to the left auld of the array. Clearly, the range of elements to be searched has been reduced by half. We now perform the same operation of finding the middle element of the new range and reduce the range accordingly. In the second iteration, the range of elements to be searched reduces to one fourth of the original array length. Similarly, with the third iteration, the range of elements reduce to one eighth. Given below is diagrammatic representation of this algorithm

We now have to represent this algorithm programmatically. For this purpose, we take two variables left and right which represent the bounds of the array to be searched. Left indicates the left bound of the range and right indicates the right bound of the range. Initially, the left bound is set to 0 and the right bound is set to array length -1. We then use a while loop to perform the repetitive task of finding the middle element and comparing it with the target event. We shall look into the termination condition of while shortly. For now, the various things to be included in the while loop are statements to find the middle element, compare it with the target and set the left and right bounds accordingly. Now coming to the condition in the while loop, this process of finding the middle element and comparing it with the target would end when we are left with just a single element. This happens when the value of left and right becomes identical. On moving further, either right would become less than left or left would become more than right. At this point of time, the search needs to be stopped. Hence the loop condition is left<=right. Given below is at the code sniper for the binary search algorithm for an array sorted in ascending order.

int[] a = {3, 7, 10, 15, 91, 110, 150}; // a sorted array not containing duplicates
int target = 91; // the element to be searched
int left = 0;
int middle;
int right = a.length – 1;
while (left <= right) {
    middle = (left + right) / 2;
    if (a[middle] == target) {
        System.out.println(“Element found at index ” + middle);
        break;
    } else if (a[middle] < target) {
        left = middle + 1;
    } else if (a[middle] > target) {
        right = middle – 1;
    }
}

Binary search has an order of log (n). This means that if the length of the array to be searched is n, the numbers of iterations required will never exceed log₂n. You can verify this by taking a few examples. If you would like to know a formal proof, visit this page.

Now we move to sorting of arrays. Sorting refers to arranging the elements of an array in either ascending or descending order. We will in the code snippets that follow arrange the elements in ascending order. The code may be suitably modified to sort in descending order. We will deal with two algorithms: Simple or selection sort and insertion or bubble sort.

The selection sort algorithm is rather quiet simple. We start from the left of the array and search for the smallest element and then place is at index 0 by swapping the two elements. In order to find the smallest element, we first assume that the element at index zero is the smallest and assign the index 0 to a variable, say minPos. We then compare each of the successive elements with the value held at minPos. If the new value in smaller than what minPos corresponds to, the index stored in minPos is updated. In this way, when we reach the end of the array, the variable minPos will be holding the index of the smallest element. Now, we exchange the number stored at index 0 with the number stored at index minPos. To do this, we require an additional variable and a simple pair of statements shown below do not work.

a[0] =a[minPos];
a[minPos]=a[0];

After the above two statements are executed, what we will be left with is just a single value which was earlier the value stored at the index minPos. The value at index 0 is lost. To avoid this, the value at index 0 should first be stored in a temporary variable and swapping be performed using the following set of three statements.

int temp = a[0];
a[0] = a[minPos];
a[minPos]= temp;

After this first iteration, we bought the smallest element to index 0. Now, we need to search for the second smallest element. To do it, we assume that the element at index 1 is the smallest and proceed as before. In this way, we sort the entire array. Given below is the code sniper for selection sort.

int[] a = {4, 85, 7, 1, 0, 36, -5, 48};
int minPos;
for (int i = 0; i < a.length; i++) {
    minPos = i;
    for (int j = i + 1; j < a.length; j++) {
        if (a[j] < a[minPos]) {
            minPos = j;
        }
    }
    int temp = a[0];
    a[0] = a[minPos];
    a[minPos] = temp;
}

Note that the above code uses two for loop, one nested within the other. The outer loop keeps track of the position from which the elements are to be searched for the smallest data item. Initially, this position is 0. After we bring the smallest item to index 0, this position in the second iteration becomes 1. The inner loop is used to find the smallest elements starting from the position defined by the outer loop. After finding the smallest element, the two values are swapped.

This algorithm has an order of n2. The outer loop executes for n times. And the inner loop executes for variable number of times deepening on the value of i. When i is 0, the inner lope executes n-1 times. When i is 1, the inner loops excites n-2 times and so on. Therefore the total number of comparisons made come out to be (n-1)+(n-2)+(n-3)+….(n-(n-1)) which equals (n-1)*n-(1+2+3….n-1). The second set of values that are subtracted wouldn’t make much difference when the value of n is very large. Hence the above expression evaluates to n2 which is the order of this algorithm. There is also another way to understand what the order of an algorithm means. It denotes the effect on the number of steps required when the number of data items change. Since the selection sort is of n2 order, when the data items change from n to 2n, the number of comparison increase to 4n2. This can also be concluded by considering the original expressions (n-1)*n-(1+2+3….n-1). Change n to 2n and find the new number of comparisons required. Find the ratio of these two terms and evaluate the limit as n tends to a very large number. You will get the result 4. Therefore when the data items are increased by twice, the comparisons required increase by four times which is the square of two. This is what the order of an algorithm represents. Try similar calculations for the previous algorithms.

Bubble Sort

The last of the algorithms that we are gaining to deal with is the bubble sort method. This algorithm has its name derived from the water bubbles which surface to the top. In a similar way, in this algorithm, we make larger numbers move to the top of the array when we wish to sort in ascending order. We iterate through the array from the left to the right and compare each pair of successive values in turn. If they are in the right order, we leave them as they are. However, if a larger item is on the left of a smaller item, we swap them. Note that only successive values are compared. In this way, when we move to the rightmost end of the array, the largest value is guaranteed to have been bubbled to the top of the array. For example, if the largest element was at index 0 initially, then in each step of comparison, this element is moved up ultimately bringing it to the top. In this way when we repeat this process for n-1 times, the entire array is guaranteed to be sorted. We can further improve this algorithm by setting proper loop termination conditions. This algorithm requires two nested loops. Within the inner loop, the loop condition can be stated to be a relation with the outer counter variable rather than the end of the array. For example, if it is the fifth iteration, then the four largest elements have already been bubbled to the top of the array. So, there is no need of comparing those values gain. So, the loop condition can be specified as j<x.length-1-i. Given below is the code for bubble sort.

int[] a = {4, 85, 7, 1, 0, 36, -5, 48};
for (int i = 0; i < a.length – 1; i++) {
    for (int j = 0; j < a.length – 1 – I; j++) {
        if (a[j + 1] < a[j]) {
            int temp = a[j];
            a[j] = a[j + 1];
            a[j + 1] = temp;
        }
    }
}

The bubble sort algorithm also has an order of n2. There are many more searching ND sorting algorithms but are complicated. If you wish to learn about them, then refer to these pages on another site:

Next : Array of objects
Prev : Processing arrays using loops

English Tutorial

Author: , 0000-00-00

Source