Archive for ‘Algorithms and Complexity’

November 20, 2011

Divided We Compute, United We Reduce

by Bobby Corpus

Once upon a time, in a far away village lay a dying old man. He called his sons to his deathbed and spoke to them one last time. He said “Sons, see that bundle of sticks? Each one of you try to break it. The one who can break it will inherit all my riches”. Each son, being greedy, wanted all the riches for himself. So each one of them tried to break the bundle of sticks but none of them succeeded. The old man asked his servant to untie the bundle and said to his sons, “Each one of you now get one stick and break it”. Without any effort, each son was able to break the stick. The old man said “You see, when you unite, no task will be difficult. The riches that I told you was a lie. We are broke. When i’m dead, make sure  you unite so that you can survive.”

Fast forward to modern times. You can think of the bundle of sticks to be a complex problem that is itself composed of smaller problems. The sons are the processors of your computer. When each processor was given the task to solve the complex problem, it fails to solve it in a reasonable amount of time. When the complex problem is decomposed into smaller problems and given to each processor, each processor is now able to solve the smaller problems quickly thereby solving the big problem quickly as well.

The process of decomposing a problem into smaller problems and solving them in separate processors is called Parallel Computing. In this article, we will compute how fast a certain algorithm will run when parallelized. The problem we want to investigate is sorting an array of a million (2^{20}) integers.

Efficient Sorting

Suppose you have an array \{ a_1, a_2, a_3, ..., a_n \} that you want to sort based on pairwise comparison.  The sorted array is just one of the many permutations of the array \{a_1, a_2, a_3,\ldots, a_n\}. In fact, if you have n different objects to sort, then there are exactly n! ways to arrange these objects, and one of them is the sorted state. You can imagine the sorting process as a decision tree. Say, for example we have array A={ a,b,c }. To sort this, we first compare a with b and there are 2 ways this can go. Either a \le b or a > b. If a \le b, we then compare b and c. This also give either b \le c or b > c. As you can see from the diagram below, this is nothing but a decision tree.



Since the height of this binary tree is lg(n!), then we have

\lg(n!) = \lg\Big[ n \cdot (n - 1) \cdot (n-2) \cdots 1\Big] \le \lg n + \lg (n-1) \cdots \lg1 \le \underbrace{\lg n \cdots \lg n}_\text{ n times}
\lg(n!) \le n\cdot \lg n

There are efficient algorithms that are able to sort of this complexity. For example, the merge sort has this complexity. Therefore, if you have an array of 2^{20} elements, then the complexity is

2^{20} \cdot \lg(2^{20}) = 2^{20} \cdot (20) = 20971520

that is, it takes about 20 million comparisons to sort an array of 1 million. Could we do any better than this? We can either upgrade the cpu of the machine doing the sorting or use two or more machines to divide the work among those machines. In this article, we are going to investigate the impact of dividing the work into smaller chunks and farming it to other processors.

Divide and Conquer

Assume we have an array n=2^{20} elements that we need to sort and suppose we have two identical processors we can use. Divide the array into 2 equal sized arrays. Give the first array to the first processor and the other half to the second processor. Apply an efficient sorting algorithm to the subarrays to produce a sorted array for each processor. We then combine the result of processor 1 and processor 2 to one big array by merging the two sorted arrays. The diagram below illustrates the process of computation:



This is also known as the MapReduce algorithm. Mapping is the process of assigning subsets of the input data to processors where each processor computes the partial result. Reducing is the process of aggregating the results of each processor to the final solution of the problem.

The process of merging is straightforward. Given two sorted arrays, begin by comparing the first element of each array. The smaller of the two will then occupy the first slot in the big array. The second element of the array from which we took the smallest element will now become the first element of that array. Repeat the process until all elements of both arrays have already occupied slots in the big array. The diagram below illustrates the algorithm of merging.




If you count the total number of comparisons that you need to merge two sorted arrays, you will find that it takes n-1 comparisons. Therefore, the complexity of the merging process is O(n).

Since each processor has n/2 sized subarrays, the sorting complexity is therefore n/p \lg (n/p). Furthermore, since the merging process takes O(n) comparisons, the total complexity of the parallel sorting process is therefore

\displaystyle n/p \lg(n/p) + n

In our example, C=2^{20}/2 \lg(2^{20}/2) + 2^{20}=  11010048 comparisons compared to 2^{20} \lg(2^{20}) = 20971520 when run sequentially. For large values of n, n/p \lg(n/p) dominates n, therefore the complexity of the parallel algorithm is O(n/p \lg(n/p)).

Can we do any better?

For a given value of n, what do you think is the value of p that reduces the running time to O(n)? If we take n=2^{20} and plot complexity against p = \{ 2, 4, 8, 16, 32\} we get the diagram below.



In this diagram, we also plotted the horizontal line y=2^{20}. The intersection of this line with the plot of \displaystyle f(p) = \frac{n}{p} \lg(\frac{n}{p}) gives us the value of p such that the total comparisons is already linear, that is,

\displaystyle f( p ) = n
\displaystyle \frac{n}{p} \lg(\frac{n}{p})  = n

To get the value of p numerically, we have to solve the root of the equation

\displaystyle g( p ) = \frac{n}{p} \lg(\frac{n}{p}) - n = 0

Simplifying,

\displaystyle \frac{1}{p} \lg(\frac{n}{p}) - 1 = 0
\displaystyle \lg(\frac{n}{p}) = p
\displaystyle \frac{n}{p} = 2^p
\displaystyle p2^p - n = 0

Since this is a non-linear equation, we can solve this using the Newton's method. It is a method to compute the roots by approximation given an initial value of the solution. Starting from a guess solution p_1, the root can be approximated using the recursive formula

\displaystyle p_{n+1} = p_n - \frac{g( p_n)}{g\prime ( p_n)}

where g\prime ( p ) is the first derivative of g( p ) . Applying the rules of derivatives, we get

\displaystyle g\prime ( p ) = p\cdot 2^p \ln 2 + 2^p

Substituting this to the formula for Newton's method, we get

\displaystyle p_{n+1} = p_n - \frac{p2^p - n}{p2^p \ln 2 - 2^p}

Below is an R code using newton’s method to compute the root of the equation g(p).

g=function(n,p){
	p* 2^p - n
}

gprime=function(n,p){
	p*2^p *log(2) - 2^p
}

newton=function(p,n,iter){
	tmp = p
	for(i in 1:iter){
		p=p-g(n,p)/gprime(n,p)
		
		if(abs(p-tmp)< 0.0001){
			break		
		}

		print(p)
		tmp=p
	}
	print(p)
}

Running this code, we get the value of p = 16:

> newton(15,n,100)
[1] 16.80905
[1] 16.08829
[1] 15.98603
[1] 16.00286
[1] 15.99944
[1] 16.00011
[1] 15.99998
[1] 16

Ignoring network latency, by distributing the input evenly into 16 processors, we get a running time of O(n) time complexity for n=2^{20} array of items. Therefore, instead of doing 20 million comparisons, you only need 1 million comparisons to sort 1 million objects.

In this age of multicore processors, parallel computing is fast becoming the norm than the exception. Learning to harness the power of multicores is becoming an extremely handy skill to have.

March 7, 2011

Way Up High A Binary Tree

by Bobby Corpus

Way up high in the binary tree
Two binary bits smiled at me
I shook that tree as hard as I could
Down came the binary bits,
Mmmm–were they good! -based on a children’s song

Now it turns out that this binary tree is an AVL tree. I wonder how high this tree is! Why does it concern us ? The reason is that the height will give us the worst case running time of the binary search tree algorithm. An AVL tree maintains its height by making sure the difference in height between the left and right subtree is at most 1. If N_h is the number of nodes contained in the tree of height h, then N_h is equal to the number of nodes in the left subtree plus the number of nodes in the right subtree PLUS 1 (the root node). In the worst case, the two subtrees differ in height by at most 1, therefore

\displaystyle N_h = N_{h-1} + N_{h-2} + 1

with initial conditions N_0 = 0 and N_1 = 1.

By adding 1 to both sides of this recurrence relation we get

\displaystyle N_h + 1 = N_{h-1} + 1 + N_{h-2} + 1

If we let b_h = N_h + 1, we can rewrite the above recurrence relation as

\displaystyle b_h = b_{h-1} + b_{h-2}

The initial conditions are now

b_0 = N_0 + 1 = 0 + 1 = 1 and b_1 = N_1 + 1 = 1 + 1 = 2

Now this is interesting! This recurrence relation is very similar to the Fibonacci recurrence relation except that the initial conditions are different. From the previous post, we found that the general solution of this recurrence relation is

\displaystyle b_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants yet to be determined. Let us solve this recurrence relation in the general case where b_0 = s and b_1=t. Computing the first 2 terms of the recurrence relation, we get two simultaneous equations in \alpha_1 and \alpha_2.

\displaystyle b_0 = s = \alpha_1 + \alpha_2
\displaystyle b_1 = t = \alpha_1 \phi + \alpha_2 (1-\phi)

Solving for \alpha_1 from the first equation, we get

\displaystyle \alpha_1 = s- \alpha_2

Plugging this into the second equation and solving for \alpha_2

\displaystyle t = (s-\alpha_2) \phi + \alpha_2 - \alpha_2\phi
\displaystyle t = s\phi + \alpha_2(1- 2\phi)
\displaystyle \alpha_2 = \frac{t-s\phi}{1-2\phi}

From which we solve for \alpha_1

\displaystyle \alpha_1 = s - \frac{t-s\phi}{1-2\phi}
\displaystyle = \frac{s(1-\phi) - t }{ 1-2\phi}

Substituting this to the recurrence relation of b_n, we have

\displaystyle b_n = \frac{s(1-\phi) -t }{1-2\phi} \phi^n + \frac{t - s\phi}{1-2\phi} (1-\phi)^n

Expanding this expression you’ll end up with a rather complex expression involving cross terms of \phi^n and (1-\phi)^n

\displaystyle b_n = \frac{s\phi^n ( 1- \phi) - t\phi^n + t(1-\phi)^n - s\phi(1-\phi)^n}{1-2\phi}

But that is where the complexity ends. Observe that

\displaystyle \phi(1-\phi) = \frac{1+\sqrt{5}}{2}\cdot \frac{1-\sqrt{5}}{2}
\displaystyle = \frac{1-5}{4} = -1

Using this result, we can write

\phi^{n}(1-\phi) = \phi^{n-1}\phi (1-\phi) = -\phi^{n-1}
\phi (1-\phi)^n = \phi (1-\phi) (1-\phi)^{n-1} = - (1-\phi)^{n-1}

Furthermore, we also observe that

\displaystyle 1-2\phi = 1 - \frac{1+ \sqrt{5}}{2} = -\sqrt{5}

Using all these results we have

\displaystyle b_n = - s\phi^{n-1} - t\phi^n + t(1-\phi)^n + s(1-\phi)^{n-1}
\displaystyle = s \Big[ \frac{(\phi^{n-1} - (1-\phi)^{n-1}}{\sqrt{5}} +\Big]  t \Big[ \frac{(\phi^n - (1-\phi)^n}{\sqrt{5}}\Big]

Notice that the expressions in square brackets are just fibonacci expressions for n-1 and n, respectively. If we use them in the expression for b_n, we can express it in terms of the fibonacci numbers

\displaystyle  b_n = sf_{n-1} + tf_n

Returning to our computation for the height of the AVL tree, since b_0 = s = 1 and b_1 = t = 2, we finally have

\displaystyle b_n = f_{n-1} + 2f_n

We can further simplify this by

\displaystyle b_n = f_{n-1} + f_n + f_n
\displaystyle = \Big(f_{n-1} + f_n\Big) + f_n
\displaystyle = f_{n+1} + f_n
\displaystyle = f_{n+2}

Since b_n = N_h + 1,

\displaystyle N_h + 1 = f_{n+2}
\displaystyle N_h = f_{n+2} -1

The formula above expresses the number of nodes of the AVL tree in terms of the height h. In terms of the golden ratio, we can write the last expression as

\displaystyle N_h = \frac{\phi^{n+2} - (1-\phi)^{n+2}}{\sqrt{5}} - 1

Approximating the nth Fibonacci Number

We can further simplify the above expression by observing that the nth fibonacci number is approximately equal to \phi^n/\sqrt{5}. To see this, we compute the distance of f_n to \phi^n/\sqrt{5}:

\displaystyle |f_n - \frac{\phi^n}{\sqrt{5}}| = |\frac{\phi^n - (1-\phi)^n}{\sqrt{5}} - \frac{\phi^n}{\sqrt{5}} | = |\frac{(1-\phi)^n}{\sqrt{5}}| < \frac{1}{\sqrt{5}} < \frac{1}{2}

Using this approximation, we can write

\displaystyle N_h = \frac{\phi^{h+2}}{\sqrt{5}} -1

So if n is the number of nodes in an AVL tree, we can estimate the height using the above formula, that is

\displaystyle n \approx \frac{\phi^{h+2}}{\sqrt{5}} -1
\displaystyle n + 1 \approx \frac{\phi^{h+2}}{\sqrt{5}}
\displaystyle \sqrt{5}(n+1) \approx \phi^{h+2}
\displaystyle h \approx \log_{\phi} \Big( \sqrt{5} (n+1)\Big) -2

Using a change of base

\displaystyle n \approx \frac{\log_2 \Big( \sqrt{5}(n+1)\Big)}{\log_2 \phi} -2
\displaystyle \approx \frac{\log_2\Big( \sqrt{5}(n+1)\Big) - 2\log_2\phi}{\log_2 \phi}
\displaystyle \approx \frac{1}{\log_2 \phi} \cdot \log_2\Big(\frac{\sqrt{5}(n+1)}{\phi^2}\Big)

Again using the fibonacci number approximation, we have \sqrt{5}/\phi^2 \approx f_2 = 1. This simplifies our estimate to

\displaystyle h \approx 1.44\log_2(n+1)

where 1/\log_2 \phi = 1.44.

If we have an array of a million names, then the worst case running time of the binary search is

\displaystyle 1.44\log_2(1000000 + 1) = 28.70.

That's incredibly efficient!

March 6, 2011

Eating The Fruit Of The AVL Tree

by Bobby Corpus

We have seen that searching a name in a sorted array of a million names is very efficient when using binary search. In fact, the running time is logarithmic, which means we only need 19 comparisons, in the worst case, to figure out if our search is successful or not. We have also seen that by using a binary search tree, we can save more efficiencies when we are searching on a dynamic array of names, that is, when the list of names grows. However, we also showed that if the BST is fed with a sorted array of names, then the search becomes very inefficient. In fact, in the worst case, we need to compare 1 million times before we realize that we don’t have a name in the array.

In the last post, we built a BST from an array of a sorted list of animals. Below is the result of that construction, repeated here for convenience.

This is a sad state of affairs. We have found a very good data structure for searching but it fails when fed with sorted data. Is there anything we can do to minimize the height of this tree? Fortunately, there is! The data structure is called a self-balancing binary search tree and there are many of them. We will only take a look at one and it is called AVL tree*.

A self-balancing binary search tree is able to maintain it’s height by keeping track of a variable in each node called the balance factor. The balance factor for a node x is defined as

Height of left subtree of node minus the height of the right subtree.

Below is an example of a binary tree with balance factors indicated inside the circles. The triangles that you see in the diagram are subtrees. They may or may not be present in the node but we show them in the diagram for some reason which will become clear later.

An AVL tree employs mechanisms to limit the difference in height between subtrees to 1, 0 or -1. These mechanisms are called rotations. There are only 4 rotations that are needed to balance an AVL tree. The figure below shows you how rotations work.

The first thing to look for after inserting a node is find a balance factor that is -2 or +2. in the first tree, the balance factor is +2. this means that the left subtree is imbalanced. Next, examine the balance factor of the left child. If the balance factor is -1, then the right subtree is higher than the left. To balance this, the child in yellow replaces its parent (green node) which now becomes its left child. What used to be the right child of the node in yellow is now the right child of the green node.

However, this rotation does not change the balance factor of the root node. Another rotation will establish the balance. It involves replacing the root node ( in red) with the yellow node and making the root node the right child of the yellow node. What used to be the right child of the yellow node now becomes the left child of the red node.

When the balance factor of the root node is -2 it means that the right subtree is imbalanced. If you observe carefully, this is just symmetric to the case above and the rotation can easily be figured out from the figure.

Balancing the Animal Tree

Let us show how we can use these transformations to fix our tree. From the figure below, you can see that the imbalance occurs after we insert the cobra. At this point, the balance factor of the “alligator” node is 2. Rotating this tree by making bat the root node fixes the tree.

We then insert dog and fox before we encounter another imbalance. Replacing cobra with dog and making cobra the left child of dog fixes the imbalance.


Inserting horse to this new configuration creates an imbalance at the root of the tree. Replacing bat with dog and making cobra the right child of bat fixes the tree.

You can follow the rest of the process until we have all ten words inserted into the BST. The result is a tree with a maximum height of 4.


What does this mean? It means that it only takes a maximum of 4 comparisons to find any animal in this structure. It also means it only takes a maximum of 4 comparisons to determine if an animal is not in the structure.

In the next post, we are going to compute the the height of an AVL tree to get an estimate of the complexity of the search.

* AVL is named after Adelson-Velskii and EM Landis.

February 28, 2011

Mathematical Beauty

by Bobby Corpus

It is said that beauty is in the eye of the beholder. As a person is more than the sum of its parts, physical beauty is just one aspect of the total beauty. Physical beauty, however, has some sort of mathematical standard. Why is it that we find movie stars beautiful? What is that standard that defines beauty? That standard is hidden in the sequence of numbers 0, 1, 1, 2, 3, 5, 8…

The careful eye will realize that, beginning on the second term, the terms in this sequence is equal to the sum of the previous two numbers. For example, the second term is just 0+1 = 1, the third term is just 1+2 = 3, and so on. This sequence is called the fibonacci sequence after Leonardo of Pisa, also known as Fibonacci. We can model this sequence as a recurrence relation. If we let F_n be the nth Fibonacci number, then by definition it is equal to the sum of the previous 2 Fibonacci numbers F_{n-1} and F_{n-2}. Mathematically we write this as

\displaystyle F_n = F_{n-1} + F_{n-2}

Looking at the recurrence relation above, we realize that it is an instance of a Homogeneous Linear Recurrence Relation With Constant Coefficients. The good news is we know how to solve this kind of recurrence relation. The characteristic equation is

\displaystyle r^2 - r -1 = 0

Using the quadratic formula, the roots are

\displaystyle r = \frac{ -(-1) \pm \sqrt{ (-1)^2 - 4\cdot 1 \cdot (-1)}}{2\cdot 1}
\displaystyle = \frac{1 \pm \sqrt{5}}{2}

If we define

\displaystyle \phi = \frac{1 + \sqrt{5}}{2}

then we can write negative root as

\displaystyle \frac{1 - \sqrt{5}}{2} = 1-\phi

From the previous post, we know that we can express the solution of F_n as a linear combination of \phi^n and (1-\phi)^n. Therefore, the solution of the Fibonacci recurrence relation is just

\displaystyle F_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants. Since F_0 = 0 and F_1 = 1, we can solve for \alpha_1 and \alpha_2:

\displaystyle F_0 = 0 = \alpha_1 \phi^0   + \alpha_2 (1-\phi)^0 = \alpha_1 + \alpha_2
\displaystyle F_1 = 1 = \alpha_1 \phi^1   + \alpha_2 (1-\phi)^1 = \alpha_1 \phi+ \alpha_2 (1-\phi)

From the first equation, we solve for \alpha_1:

\displaystyle \alpha_1 = - \alpha_2

Substituting this into the second equation and solving for \alpha_2 we get:

\displaystyle -\alpha_2 \phi + \alpha_2 - \alpha_2 \phi = 1
\displaystyle = -2\alpha_2\phi + \alpha_2 = 1
\displaystyle = \alpha_2 (-2 \phi +1) = 1

Observe that

\displaystyle  1-2\phi = 1 - 2\frac{1+\sqrt{5}}{2} = 1 - (1 + \sqrt{5}) = -\sqrt{5}

This means that

\displaystyle -\sqrt{5} \cdot \alpha_2 = 1
\displaystyle \alpha_2 = - \frac{1}{\sqrt{5}}

and

\displaystyle \alpha_1 = \frac{1}{\sqrt{5}}

Therefore, the solution to the Fibonacci recurrence relation is

\displaystyle F_n = \frac{\phi^n - (1-\phi)^n}{\sqrt{5}}

Beauty and Fibonacci numbers

After all that tedious computations, so what? What does Fibonacci have anything to do with beauty? If you take the successive ratio of the fibonacci numbers

\displaystyle \lim_{n \rightarrow \infty} \frac{F_{n+1}}{F_n} = \phi

Here is a list of the first few fibonacci numbers starting at 1 and the corresponding ratios:

1        1       1        1.000000
2        2       1        2.000000
3        3       2        1.500000
4        5       3        1.666667
5        8       5        1.600000
6       13       8        1.625000
7       21      13        1.615385
8       34      21        1.619048
9       55      34        1.617647
10      89      55        1.618182
11     144      89        1.617978
12     233     144        1.618056
13     377     233        1.618026
14     610     377        1.618037
15     987     610        1.618033
16    1597     987        1.618034
17    2584    1597        1.618034
18    4181    2584        1.618034
19    6765    4181        1.618034
20   10946    6765        1.618034

The first column in the table above is just a line number. The second column is F_{n+1}, the third column is F_n and the last column is F_{n+1}/F_n. You can see that the ratio approaches value of \phi=1.618034.

The constant \phi is called the Golden Ratio by the Greeks. Any structure that follows the golden ratio is structurally beautiful to the eye. Below is an image of a rectangle with labeled sides.



The rectangle is called a Golden Rectangle if

\displaystyle \frac{a+b}{a} = \frac{a}{b} = \phi.

The Golden Ratio can be found in many aesthetic works. Leonardo Da Vinci used this in his Vitruvian Man. This is probably why the fibonacci sequence was featured in the beginning of the movie (book) Da Vinci Code.

February 27, 2011

A Method To The Madness

by Bobby Corpus

In the previous post on recurrence relations, we derived a general method to solve divide and conquer relations. In another article, we derived the formula of the sum of the first n non-negative integers:

\displaystyle \sum_{i=0}^n i = \frac{n(n+1)}{2}

If you think carefully about it, you can express the sum of the first n integers as a recurrence relation. If you denote s_n to be the sum of the first n integers, then it is just equal to n plus the sum of the first n-1 integers. Symbolically, we write this as

\displaystyle s_n = s_{n-1} + n

A good question now comes to mind: Is there also a method to solve this kind of recurrence relation? Surprise! Surprise! The answer is YES! But first, let us handle the easier cases.

A good example of a recurrence relation we frequently encounter is in our financial life, when we compute how much money we will get at the end of one year when we put it in a bank. Suppose, you deposit 1 million bucks in a bank with 10% interest compounded yearly. How much money will you get after 10 years? If we let P_n be the amount of money we have after n years, then we can write the compounding of interest as a recurrence relation:

\displaystyle P_n = P_{n-1} + 0.1\cdot P_{n-1} = 1.1 \cdot P_{n-1}

The recurrence relation above is just an example of a more general form called the Linear Homogeneous Recurrence Relation With Constant Coefficients. That’s a pretty long name for a seemingly simple concept. What this means is that your recurrence relation has the following form:

\displaystyle a_{n} = c_1 a_{n-1} + c_2 a_{n-2} + \ldots + c_k a_{n-k}

To solve this recurrence relation, we invoke the method of “Lucky Guess”, that is, we will guess that the solution of the above recurrence relation is of the form a_n = r^n. If we substitute this to the equation above, we will get

\displaystyle r^n = c_1 r^{n-1} + c_2 r^{n-2} + \ldots + c_k r^{n-k}

Dividing both sides by r^{n - k} we get,

\displaystyle r^k = c_1 r^{k-1} + c_2 r^{k-2} + \ldots + c_k.

This means that a_n  = r^n is a solution to this recurrence relation if r satisfies the following algebraic equation:

\displaystyle r^k - c_1 r^{k-1} - c_2 r^{k-2} - \ldots - c_k = 0

This is called the Characteristic Equation and the roots of this equation are called characteristic roots. To make things look simpler, let’s suppose we have a recurrence relation of degree 2:

\displaystyle a_n = c_1 a_{n-1} + c_2 a_{n-2}

The characteristic equation of this recurrence relation is

\displaystyle r^2 - c_1 r - c_2 = 0

Case of distinct roots

Suppose the roots of the above equation are r_1 and r_2, respectively. Then

\displaystyle a_n = \alpha_1 r_1^n + \alpha_2 r_2^n

is a solution of the recurrence relation, where \alpha_1 and \alpha_2 are constants. To prove this, substitute the solution to the recurrence relation

\displaystyle  c_1 a_{n-1} + c_2 a_{n-2} =  c_1 (\alpha_1 r_1^{n-1} + \alpha_2 r_2^{n-1}) + c_2 (\alpha_1 r_1^{n-2} + \alpha_2 r_2^{n-2})
\displaystyle = c_1\alpha_1 r_1^{n-1} + c_1 \alpha_2 r_2^{n-1} + c_2  \alpha_1 r_1^{n-2} + c_2\alpha_2 r_2^{n-2}

Collecting the \alpha coefficients:

\displaystyle = \alpha_1 ( c_1 r_1^{n-1} + c_2 r_1^{n-2}) + \alpha_2 (c_1 r_2^{n-1} + c_2 r_2^{n-2})
\displaystyle = \alpha_1 r_1^{n-2} (c_1 r_1 + c_2) + \alpha_2 r_2^{n-2} (c_1 r_2 + c_2)
\displaystyle = \alpha_1 r_1^{n-2} r_1^2 + \alpha_2 r_2^{n-2} r_2^2

Using the characteristic equation r^2=c_1 r + c_2, we have

\displaystyle = \alpha_1 r_1^n + \alpha_2 r_2^2
\displaystyle = a_n

as was to be shown.

Case of Repeated Roots

If the roots of the characteristic equation is repeated, that is, the quadratic equation is a perfect square, then if r_1 is the root, the solution to the recurrence relation can be expressed as

\displaystyle a_n = \alpha_1 r_1^n + n\alpha_2 r_1^n.

To see this, substitute this formula to the recurrence relation

\displaystyle c_1 a_{n-1} + c_2 a_{n-2} = c_1 (\alpha_1 r_1^{n-1} + n\alpha_2 r_1^{n-1}) + c_2 (\alpha_1 r_1^{n-2} + n\alpha_2 r_1^{n-2})
\displaystyle = \alpha_1 ( c_1 r_1^{n-1} + c_2 r_1^{n-2}) + n\alpha_2 (c_1 r_1^{n-1} + c_2 r_1^{n-2})
\displaystyle = \alpha_1 r_1^{n-2} ( c_1 r_1 + c_2) + n \alpha_2 r_1^{n-2} (c_1 r_1 + c_2)
\displaystyle = \alpha_1 r_1^n + n \alpha_2 r_1^n
\displaystyle = a_n

Compound Interest

Applying this to our compound interest recurrence relation, we see that the characteristic equation of

\displaystyle P_n = P_{n-1} + 0.1\cdot P_{n-1} = 1.1 \cdot P_{n-1}

is

\displaystyle r - 1.1 = 0

whose characteristic root is

\displaystyle r = 1.1

Therefore, the solution to the compound interest recurrence relation has the form

\displaystyle P_n = \alpha \cdot 1.1^n

If at n=0, the amount of money you have in the bank is P_0, we can solve for \alpha:

\displaystyle P_0 = \alpha \cdot 1.1^0 = \alpha \cdot 1 = \alpha

This means that at the end of period n, the amount of money you have in the bank is

\displaystyle P_n = 1.1^n \cdot P_0

Or substituting 1 million to P_0, we get P_{10} = 2593742.

February 20, 2011

Trees Are Made By Fools Like Me

by Bobby Corpus

We have seen in the previous post that searching a name in a sorted array of 1 million names can be done very efficiently using the binary search. But that was only the second half of the story. What was not told was how we ended up with a sorted array. Was it handed to us already sorted or were we the ones who sorted it?

Now suppose that someone gave us an unsorted list of 10 different names to append to the original 1 million. How do we search now for a name? We cannot just append these 10 names to the array of sorted names because it will destroy the “sorted” property of the original array. What we can do is increase the length of the original array by 10 more slots and reinsert the 1 million and 10 names to this new array and sorting the new array. Reinserting would take linear time and re-sorting on a 99% sorted array takes quadratic time using quicksort. So is there a better way to do it? Yes there is and we are going to build a tree for it.

Suppose we are given a list of animals:

leopard, cobra, shark, horse, alligator, bat, tiger, fox, dog, pig

Our task is to create a structure that will allow adding more entries and still be able to search as fast as a binary search. To do this, let’s construct a tree with the following properties

- each node will contain at most 2 children
- each node is identified by a key. For this example, we will use the name of the animal as key.
- the left child of each node should have a key that is less than the value of the node.
- the right child of each node should have a key that is greater than the value of the node.

We call the result of this construction a Binary Search Tree or BST for short.

Given these rules, we can start constructing our BST starting from the word leopard as the root of the tree. The next word is cobra. Since cobra is less than leopard (lexicographically speaking), we put cobra as the left child of leopard. The next word is shark. Since shark is greater than leopard, we put shark as the right child of leopard. Since leopard already has 2 children, we expect the other words to be children of either cobra or shark.

The next word is horse. Since horse is less than leopard, we compare it with cobra. Since horse is greater than cobra, we make horse the right child of cobra. Below is a step by step visual construction of the BST.



The words in bold orange are the newly inserted words into the BST structure. The resulting BST is shown below.



Searching a word on a BST is almost the same as inserting an entry. You first compare your search term with the key of the root. If they are equal, you have found the entry. If the search term is less than the root key, then search the left subtree. If the search term is greater than the root key, then search the right subtree. If the search term is not equal to the node key and the node has no more children, then the word is not in the list. This algorithm is recursive and is shown below:

boolean search_BST(Node node, String searchTerm):
    if node is null
        return false
    if node key is equal to searchTerm:
        return true
    if node key is less than searchTerm:
        search_BST(leftChild, searchTerm)
    else
        search_BST(rightChild, searchTerm)

Let’s trace how to search for the word bat. We start at the root leopard. Since bat < leopard, we search the left subtree of leopard which is rooted at cobra. We now compare bat and cobra. Since bat < cobra, we search the left subtree of cobra, which is rooted at alligator. Since bat > alligator, we search the right subtree of alligator which is rooted at bat. However, since the search term is equal to bat, we have found our word and return true. It only took us 4 comparison to find the word bat.

If we look at our data structure closely, we observe that the running time of the search algorithm is proportional to the height of the tree. It is therefore in our best interest to make our tree as small as possible. But how small? Given a tree of n nodes, how small can be we build our binary tree? At the root (level 0) we have 1 node. At level 1, we can only attach 2 nodes. At level 2, we can attach 4 nodes. At level h, which we denote as the height of the tree, we can have a maximum of

\displaystyle n \le  2^0 + 2^1 + 2^2 + \ldots + 2^h = \sum_{i=0}^h 2^i = 2^{h+1} -1 < 2^{h+1}

This means that the smallest possible height of the binary tree we can create from n nodes is h = \lfloor \log_2 n \rfloor.

Looking at our example, we can see that the left subtree is 2 levels deeper than the right subtree. This means that our tree is not optimal. Ideally, our tree should have a height of \lfloor \log_2 10 \rfloor = 3. In fact, if we built our BST from a sorted list, say

alligator, bat, cobra, dog, fox, horse, leopard, pig,shark, tiger

we will get this structure:



With the structure above, the worst-case running time is O(n). So how do we optimize our BST? That will be the topic of the next post.

February 11, 2011

A Match Made In Heaven

by Bobby Corpus

There are some things which go naturally with each other. For some, it is a person whom they consider a soulmate. In the case of coffee, it is a creamer. For algorithms, it is the data structure. For example, a recursive algorithms suits very well with a tree data structure and a tree data structure suits well with a recursive algorithm.

I assume that you are a programmer or know how to program and that you know what a data structure is. Examples of data structures we usually encounter in daily programming chore is the array. An array is a collection of objects, usually of the same kind. Arrays go well with iterative algorithms like the for loop or while loop. A more interesting kind of data structure is the tree. You can imagine a tree as a hierarchical structure, much like the what you see in Windows Explorer where a filesystem is displayed as a tree of directories. I have encountered programmers, in my career, who don’t know how to navigate a directory structure in code. They usually come up with a spaghetti-like code but which are not able to go deep into the directory structure. However, a good programmer can navigate a directory tree with a few lines of code.

Let’s have an example. Below is a screenshot of a directory structure. It is composed of files and directories (surprise!). We know that directories can contain other files and directories whereas files don’t have that property. In tree language, we call files the “leaves” while directories are the nodes. Well not exactly, because if a directory is empty, it can appear as a leaf. To avoid confusion, let us treat both files and directories in a generic manner and call them nodes. A directory is a special kind of node in that it can contain other nodes, which in this case are files and directories. To traverse a directory structure, we follow the algorithm below:

algorithm traverseTree(Node node):
    print node name
    if node is a directory
        list the children of the directory
        for each child
            traverseTree(child)


Notice that the algorithm above calls itself and accepts any node as input, whether it’s a file or a directory. This is an example of a recursive algorithm applied to a tree data structure. Simple and elegant!

You might ask “What is the complexity of the traverseTree algorithm?”. Before we can answer that, let’s agree first on some terminology. We define a graph to be a set of nodes connected to each other by edges. An abstract model of the directory structure above is shown below. The circles represent the nodes while the lines connecting them represent the edges. Therefore a graph G is composed of a set of nodes V and a set of edges E and we denote a graph as G(V,E). Using this definition, you can see that a tree is just a special case of a graph. Let v\in V be a node. The degree of a node \deg(v) is the number of edges it has. For example, if a node is connected to 3 other nodes, then the degree of that node is 3.

An interesting property of the degree of a node is that the total sum of the degrees of each node is equal to 2 times the number of edges, that is,

\displaystyle \sum_{v\in V} \deg(v) = 2\cdot |E|.

To see why this is, assume you have two nodes u and v connected by an edge. Now, what is the degree of u and the degree of v? By definition, the degree of a node is the number of edges it has. Since u has one edge, \deg(u) = 1. In the same way, since v has one edge, \deg(v) = 1. Therefore, \deg(u) + \deg(v) = 1 + 1 = 2, which is equal to twice the number of edges of graph G(V,E).

Let’s now go back to our main task of computing the complexity of the algorithm above. For each node v, the number of times “print node name” is executed is equal to 1 (surprise!). The number of times the statement “if node is a directory” is executed is also 1. Finally, the number of times the for loop is executed is equal to the degree of node v, that is, \deg(v). Since we do this for each node of graph G(V,E), the total executions is equal to

\displaystyle \sum_{v\in V} \Big( 1 + 1 + \deg(v) \Big) = \sum_{v\in V} \Big( 2 + \deg(v) \Big)
\displaystyle = \sum_{v\in V} 2 + \sum_{v\in V} \deg(v)
\displaystyle =  2\cdot \sum_{v\in V} 1 + \sum_{v\in V} \deg(v)

Now, the first summation is just equal to the number of nodes of V, while the second summation is just twice the number of edges. This gives us,

\displaystyle 2\cdot |V| + 2\cdot |E| = 2\cdot \big(|V| + |E| \big) = O(|V| + |E|).

Therefore, the complexity of the algorithm above is Big Oh of the sum of the number of nodes and the number of edges of graph G(V,E).

February 2, 2011

101 Ways To Skin A Cat

by Bobby Corpus

Figuratively speaking, of course. I’m against animal cruelty. However, in this article we will explore other ways to compute algorithmic complexity. In a previous post, we computed the running time of the code below:


for(int i=0;i<10;i++){
   for(int j=0;j<i;j++){
     for(int k=0;k<j;k++){
         System.out.println("Hello World");
     }
   }
}

Now that we learned about the Big Oh, we don’t need to know exactly how many times the “Hello World” will execute. We just need to determine the asymptotic behavior of the algorithm. To do this, we will now compute the running time of the above code snippet using elementary calculus.

In the previous post, we came up with an expression of the running time of the code snippet as

\displaystyle C = \frac{1}{2} \Big( \sum_{i=1}^n i^2 + \sum_{i=1}^n i \Big)

We know that the second term above is just

\displaystyle \frac{n(n+1)}{2}

Also, our derivation of the sum of the first term was quite involved so we will try to avoid that as much as possible. If you take a look at the figure below, you can see the graph of f(x) = x^2. I have to mention that this figure was created using OmniGraphSketcher for Ipad.

You can also see rectangles shaded in blue. These rectangles are of width 1 unit and the height of each rectangle is the number above it. Therefore, the area of each rectangle is numerically equal to the height. As you can see from the figure, the total area of the rectangles from 1 to 4 is equal to 1 + 4 + 9 + 16 = 1^2 + 2^2 + 3^2 + 4^2 = \sum_{i=1}^4 i^2. You can also observe that this area is less than the area under the curve, the area indicated in gray. The area under the curve is equal to the proper integral of x^2 evaluated from x=1 to x=5, or symbolically as

\displaystyle \int_1^5 x^2 dx

You might want to ask yourself why the upper limit of integration is x=5 and not x=4.

As can be seen clearly in the figure, the total area of the rectangles is less than the area under the curve, or

\displaystyle \sum_{i=1}^n i^2 \le \int_1^{n+1} x^2 dx

for some finite number n.

Performing the integration and substituting the limits, we get

\displaystyle \sum_{i=1}^n i^2 \le \int_1^{n+1} x^2 dx = \frac{1}{3} x^3\Big|_1^{n+1}
\displaystyle = \frac{1}{3} \Big[ (n+1) ^3 -1 \Big]

Combining this result with C above, we get

\displaystyle C = \frac{1}{2} \Big( \sum_{i=1}^n i^2 + \sum_{i=1}^n i \Big) \le \frac{1}{3} \Big[ (n+1) ^3 -1 \Big] + \frac{n(n+1)}{2}
\displaystyle = \frac{2n^3 + 9 n^2 + 9 n}{6}
\displaystyle < \frac{9n^3 + 9 n^2 + 9 n}{6} < \frac{9}{6}\Big( n^3 + n^3 + n^3\Big) < \frac{9}{2} n^3

By taking M = 9/2, we have

\displaystyle  |C| < M\cdot n^3 = O(n^3)

Therefore, the asymptotic complexity of the code snippet is O(n^3).

January 30, 2011

For the Nth time

by Bobby Corpus

We are sometimes irritated when somebody does something stupid and repeatedly does so. That’s why we use the phrase “for the nth time” to describe the fact that you’ve already lost count of such things. So where does this phrase come from? I would probably think this came from mathematics and especially that technique called mathematical induction.

There are some statements in mathematics which we think to be true but we are not really sure of it. One class of these problems have something to do with statements about sequences of numbers. For example, if somebody told you that the sum the first n natural numbers is equal to \frac{n(n+1)}{2}, would you believe it? You can either derive this result or you can try it out for a few numbers to convince yourself that it is true. But if this statement is true for the first few numbers, there is no guarantee that it will be true for all numbers. To convince ourselves, we use the mathematical induction technique.

The mathematical induction is a two step process. If you wish to prove that f(n) is true for all positive integers n,

  1. Basis Step – First show that it is true for n = 1
  2. Inductive Step – Assume that f(n) is true. Prove that f(n+1) is true.

When you have shown both steps to be true, then you have shown that f(n) is true for all positive integer n.

An Example

Show that

\displaystyle  f(n) = \sum_{i=1}^{n} i = \frac{n(n+1)}{2}

If we substitute n = 1 to the above expression, we get f(1) = \frac{1(1+1)}{2} = \frac{2}{2} = 1

We have just shown that the basis step is true. Now let’s work on the inductive step. Assuming that f(n) is true, we now show that it is also true for f(n+1). How do we do this? We can write f(n+1) as the sum of the first n terms and n+1, that is,

\displaystyle f(n+1) = \sum_{i=1}^{n+1} i = \frac{n(n+1)}{2} + (n+1)

Then we simplify the right hand side:

\displaystyle = \frac{n(n+1) + 2n + 2}{2}
\displaystyle = \frac{n^2 + n + 2n + 2}{2}
\displaystyle = \frac{n^2 + 3n + 2}{2}

Notice that we can factor the quadratic expression of the numerator:

\displaystyle = \frac{(n+1)(n+2)}{2}
\displaystyle = \frac{(n+1)(n + 1 + 1)}{2}
\displaystyle = \frac{(n+1)\Bigg[(n+1) + 1\Bigg]}{2}

Now the last expression is precisely what you will get when you substitute n+1 to f(n):

\displaystyle f(n) = \frac{n(n+1)}{2} \Rightarrow f(n+1) = \frac{(n+1) ( n+1 +1)}{2}

We have just shown, using the mathematical induction technique, that \sum_{i=1}^{n} i = \frac{n(n+1)}{2} is true for all positive integers. Using this technique, how would you show that the binary search algorithm has running time a_n = \lfloor \log_2 n \rfloor + 1?

January 30, 2011

Conquering Divide and Conquer

by Bobby Corpus

In a previous post, we derived an expression for the running time of the binary search using a recurrence relation. We also solved this recurrence relation using the brute force method. In this article, we are going to derive a general expression for the running time of the divide and conquer algorithm.

A divide and conquer algorithm has a recurrence relation in the form

\displaystyle f(n) = af(n/b) + g(n)

where f(n) is an increasing function. Looking at this recurrence relation, the right hand side says that the total executions for an input of size n is equal to the number of executions of a sub-problem of size n/b times some number a plus some function of n. The recurrence relation of the binary search algorithm, which is one type of divide and conquer algorithm, is a_n = a_{\lfloor n/2 \rfloor} + 1. In this case, f(n/b) =  a_{\lfloor n/2 \rfloor} and g(n) = 1.

As usual, we make things simpler by assuming that n = b^k, for some integer k. We can expand the recurrence relation above using the first few terms:

\displaystyle f(n/b) = af(n/b^2) + g(n/b)
\displaystyle f(n/b^2) = af(n/b^3) + g(n/b^2)
\displaystyle f(n/b^3) = af(n/b^4) + g(n/b^3)

Plugging these results back to the expression for f(n), we get

\displaystyle f(n) = a \big[ a f(n/b^2) + g(n/b) \big] + g(n)
\displaystyle = a^2 f(n/b^2) + ag(n/b) + g(n)

Substituting the expression for f(n/b^2) we get

\displaystyle f(n) = a^2 \big[ af(n/b^3) + g(n/b^2) \big] + ag(n/b) + g(n)
\displaystyle = a^3 f(n/b^3) + a^2g(n/b^2) + ag(n/b) + g(n)

And one more time for f(n/b^3),

\displaystyle f(n) = a^3 \big[  af(n/b^4) + g(n/b^3) \big] + a^2g(n/b^2) + ag(n/b) + g(n)
\displaystyle = a^4 f(n/b^4) + a^3g(n/b^3) +  a^2g(n/b^2) + ag(n/b) + g(n)

We are starting to see a pattern here! If we continue this k number of times, we should get

\displaystyle f(n) = a^k f(1) + \sum_{j=0}^{k-1} a^j g(n/b^j)

The reason why we have f(1) is because n/b^k = 1.

Let us simplify a bit and assume that g(n) = c, where c is a constant. In this case, the above expression will simplify to

\displaystyle f(n) = a^k f(1) + c \sum_{j=0}^{k-1}  a^j

Case when a = 1

When a = 1, the second term of the above expression is just \sum_{j=0}^{k-1} 1 = ck. Therefore,

\displaystyle  f(n) = f(1) + ck

Since n = b^k, then by definition of logarithms, we have

\displaystyle f(n) = f(1) + c\cdot \log_b n

In the realistic case where n is not a power of b, then we can find n between powers of b, that is,

\displaystyle b^k < n < b^{k+1}

for some k. Taking the logarithms of the above expression, we get

\displaystyle k < \log_b n < k + 1

Since f(n) is an increasing function, substituting b^{k+1} to f(n) = f(1) + ck, we get

f(n) < f(b^{k+1}) = f(1) + c(k+1) = \big[ f(1) + c \big] + ck \le \big[ f(1) + c \big] + c\cdot \log_b n

Therefore, when a = 1, the function f(n) = a^k f(1) + c \sum_{j=0}^{k-1}  a^j is O(\log_2 n).

Case when a > 1

Let’s digress for a while and review the definition of logarithms. Let a, b, and c be positive real numbers, if a = b^c, then the logarithm of a to the base b is

\displaystyle \log_b a = c

From this definition, we therefore have the identity a = b^{\log_b a}. We can further show that

a^{\log_b c} = c^{\log_b a}

To see this, let

\displaystyle y = a ^{\log_b c}

Taking the logarithms, we get

\displaystyle \log_a y = \log_b c
\displaystyle b^{\log_a y} = c

Raising both sides to the power \log_b a, we get

\displaystyle c^{\log_b a} = \big( b^{\log_b a}\big) ^{\cdot \log_a y}
\displaystyle = a^{\log_a y} = y = a ^{\log_b c}

Therefore,

\displaystyle a^{\log_b c} = c^{\log_b a}

Having taken cared of that, let’s now return to the case where a > 1. Let’s write again our expression for f(n),

\displaystyle f(n) = a^k f(1) + c \sum_{j=0}^{k-1}  a^j

For a > 1, the right term of the above expression is just a geometric progression whose sum is

\displaystyle \sum_{j=0}^{k-1}  a^j = \frac{a^k - 1}{a-1}

Plugging this into our expression for f(n), we get

\displaystyle f(n) = a^k f(1) + c \frac{a^k - 1}{a-1} = a^k f(1) + \frac{ca^k - c}{a-1}
\displaystyle = a^k\big[ f(1) + \frac{c}{a-1}\big] - \frac{c}{a-1}

If we assume n = b^k, we have \log_b n = k and a^k = a^{\log_b n} = n ^{\log_b a}. Therefore,

\displaystyle f(n) = C_1 n^{\log_b a} + C_2

where C_1 = f(1) + c/(a-1) and C_2 = - c/(a-1).

Now, what happens when n is not a power of b? As usual, we have b^k < n <b^{k+ 1} and

\displaystyle f(n) \le f(b^{k+1}) = a^{k+1}\big[ f(1) + \frac{c}{a-1}\big] - \frac{c}{a-1}
\displaystyle = C_1 a^{k+1} + C_2
\displaystyle = (a\cdot C_1) \cdot a^k + C_2
\displaystyle = (a\cdot C_1) \cdot n ^{\log_b a} + C_2

Therefore, for a > 1, f(n) is O(n^{\log_b a}).

Follow

Get every new post delivered to your Inbox.