Archive for ‘Performance’

November 20, 2011

Divided We Compute, United We Reduce

by Bobby Corpus

Once upon a time, in a far away village lay a dying old man. He called his sons to his deathbed and spoke to them one last time. He said “Sons, see that bundle of sticks? Each one of you try to break it. The one who can break it will inherit all my riches”. Each son, being greedy, wanted all the riches for himself. So each one of them tried to break the bundle of sticks but none of them succeeded. The old man asked his servant to untie the bundle and said to his sons, “Each one of you now get one stick and break it”. Without any effort, each son was able to break the stick. The old man said “You see, when you unite, no task will be difficult. The riches that I told you was a lie. We are broke. When i’m dead, make sure  you unite so that you can survive.”

Fast forward to modern times. You can think of the bundle of sticks to be a complex problem that is itself composed of smaller problems. The sons are the processors of your computer. When each processor was given the task to solve the complex problem, it fails to solve it in a reasonable amount of time. When the complex problem is decomposed into smaller problems and given to each processor, each processor is now able to solve the smaller problems quickly thereby solving the big problem quickly as well.

The process of decomposing a problem into smaller problems and solving them in separate processors is called Parallel Computing. In this article, we will compute how fast a certain algorithm will run when parallelized. The problem we want to investigate is sorting an array of a million (2^{20}) integers.

Efficient Sorting

Suppose you have an array \{ a_1, a_2, a_3, ..., a_n \} that you want to sort based on pairwise comparison.  The sorted array is just one of the many permutations of the array \{a_1, a_2, a_3,\ldots, a_n\}. In fact, if you have n different objects to sort, then there are exactly n! ways to arrange these objects, and one of them is the sorted state. You can imagine the sorting process as a decision tree. Say, for example we have array A={ a,b,c }. To sort this, we first compare a with b and there are 2 ways this can go. Either a \le b or a > b. If a \le b, we then compare b and c. This also give either b \le c or b > c. As you can see from the diagram below, this is nothing but a decision tree.



Since the height of this binary tree is lg(n!), then we have

\lg(n!) = \lg\Big[ n \cdot (n - 1) \cdot (n-2) \cdots 1\Big] \le \lg n + \lg (n-1) \cdots \lg1 \le \underbrace{\lg n \cdots \lg n}_\text{ n times}
\lg(n!) \le n\cdot \lg n

There are efficient algorithms that are able to sort of this complexity. For example, the merge sort has this complexity. Therefore, if you have an array of 2^{20} elements, then the complexity is

2^{20} \cdot \lg(2^{20}) = 2^{20} \cdot (20) = 20971520

that is, it takes about 20 million comparisons to sort an array of 1 million. Could we do any better than this? We can either upgrade the cpu of the machine doing the sorting or use two or more machines to divide the work among those machines. In this article, we are going to investigate the impact of dividing the work into smaller chunks and farming it to other processors.

Divide and Conquer

Assume we have an array n=2^{20} elements that we need to sort and suppose we have two identical processors we can use. Divide the array into 2 equal sized arrays. Give the first array to the first processor and the other half to the second processor. Apply an efficient sorting algorithm to the subarrays to produce a sorted array for each processor. We then combine the result of processor 1 and processor 2 to one big array by merging the two sorted arrays. The diagram below illustrates the process of computation:



This is also known as the MapReduce algorithm. Mapping is the process of assigning subsets of the input data to processors where each processor computes the partial result. Reducing is the process of aggregating the results of each processor to the final solution of the problem.

The process of merging is straightforward. Given two sorted arrays, begin by comparing the first element of each array. The smaller of the two will then occupy the first slot in the big array. The second element of the array from which we took the smallest element will now become the first element of that array. Repeat the process until all elements of both arrays have already occupied slots in the big array. The diagram below illustrates the algorithm of merging.




If you count the total number of comparisons that you need to merge two sorted arrays, you will find that it takes n-1 comparisons. Therefore, the complexity of the merging process is O(n).

Since each processor has n/2 sized subarrays, the sorting complexity is therefore n/p \lg (n/p). Furthermore, since the merging process takes O(n) comparisons, the total complexity of the parallel sorting process is therefore

\displaystyle n/p \lg(n/p) + n

In our example, C=2^{20}/2 \lg(2^{20}/2) + 2^{20}=  11010048 comparisons compared to 2^{20} \lg(2^{20}) = 20971520 when run sequentially. For large values of n, n/p \lg(n/p) dominates n, therefore the complexity of the parallel algorithm is O(n/p \lg(n/p)).

Can we do any better?

For a given value of n, what do you think is the value of p that reduces the running time to O(n)? If we take n=2^{20} and plot complexity against p = \{ 2, 4, 8, 16, 32\} we get the diagram below.



In this diagram, we also plotted the horizontal line y=2^{20}. The intersection of this line with the plot of \displaystyle f(p) = \frac{n}{p} \lg(\frac{n}{p}) gives us the value of p such that the total comparisons is already linear, that is,

\displaystyle f( p ) = n
\displaystyle \frac{n}{p} \lg(\frac{n}{p})  = n

To get the value of p numerically, we have to solve the root of the equation

\displaystyle g( p ) = \frac{n}{p} \lg(\frac{n}{p}) - n = 0

Simplifying,

\displaystyle \frac{1}{p} \lg(\frac{n}{p}) - 1 = 0
\displaystyle \lg(\frac{n}{p}) = p
\displaystyle \frac{n}{p} = 2^p
\displaystyle p2^p - n = 0

Since this is a non-linear equation, we can solve this using the Newton's method. It is a method to compute the roots by approximation given an initial value of the solution. Starting from a guess solution p_1, the root can be approximated using the recursive formula

\displaystyle p_{n+1} = p_n - \frac{g( p_n)}{g\prime ( p_n)}

where g\prime ( p ) is the first derivative of g( p ) . Applying the rules of derivatives, we get

\displaystyle g\prime ( p ) = p\cdot 2^p \ln 2 + 2^p

Substituting this to the formula for Newton's method, we get

\displaystyle p_{n+1} = p_n - \frac{p2^p - n}{p2^p \ln 2 - 2^p}

Below is an R code using newton’s method to compute the root of the equation g(p).

g=function(n,p){
	p* 2^p - n
}

gprime=function(n,p){
	p*2^p *log(2) - 2^p
}

newton=function(p,n,iter){
	tmp = p
	for(i in 1:iter){
		p=p-g(n,p)/gprime(n,p)
		
		if(abs(p-tmp)< 0.0001){
			break		
		}

		print(p)
		tmp=p
	}
	print(p)
}

Running this code, we get the value of p = 16:

> newton(15,n,100)
[1] 16.80905
[1] 16.08829
[1] 15.98603
[1] 16.00286
[1] 15.99944
[1] 16.00011
[1] 15.99998
[1] 16

Ignoring network latency, by distributing the input evenly into 16 processors, we get a running time of O(n) time complexity for n=2^{20} array of items. Therefore, instead of doing 20 million comparisons, you only need 1 million comparisons to sort 1 million objects.

In this age of multicore processors, parallel computing is fast becoming the norm than the exception. Learning to harness the power of multicores is becoming an extremely handy skill to have.

February 4, 2007

Performance of Shortest Queue Discipline

by Bobby Corpus

We computed the performance of random line queuing discipline in the previous article. However, in real life, we don’t really throw a coin in order to pick which of the two lines we take. Most of the time, we go to the shortest line for service. Using the techniques we have learned in modeling, we will compute the performance of this queue and compare the performance to the other queueing discplines we have computed so far.

Markov Diagram

markov_shortest_line.png

As usual, the arrival rate of customers is 2 customers per minute. Each customer is given a service at an average of 20 secs, which means that an average of 3 customers are being serviced per minute.

January 21, 2007

Computing Performance of Symmetric Multi-Processing Scheduling

by Bobby Corpus

You might have noticed that when you do a transaction in a bank, you usually see a single line being served by 2 or more tellers. You must have perceived that this technique of lining up customers is better than when each teller has its own line. This will lessen the chance of you being stuck for a long time when the guy before you takes a long time to complete transaction.

In computer systems, you can view the customers as tasks to be executed by 2 or more processors. Each processor will pick from a common line, a task that it will execute. In this article, we will analyze the performance of symmetric multiprocessing using continuous markov chain analysis. We will also be using the tool available from www.extremecomputing.org to help us compute the probabilities.

January 4, 2007

Computing Continuous Markov Chains Using Extreme Optimal Solver

by Bobby Corpus

In the last article, we computed the probabilities associated with the states in a continuous markov chain model using balance equations. The computation, although elementary, was tedious and distracts us from the real problem, which is understanding the performance of a queueing model. Fortunately, there is an online tool we can use in order to speed up the computation of continuous markov chains. This tool can be found in the site http://www.extremecomputing.org. This site contains numerous solvers that can be of interest to the computational scientists. However, the tool we are interested in is the Continuous Markov Chain solver which can be accessed in this link.

December 10, 2006

Modeling Queuing Systems Using Markov Chains

by Bobby Corpus

Have you ever notice how various establishments make people line in a certain way? Have you gone to a bank and noticed a number of tellers servicing each client but there is only one line? Wouldn’t it be more natural to let people line up before each teller like you see in a grocery? The way customers are lined up in a server or system of servers is called Queueing Discipline.
In this article, we will learn how various queuing disciplines will affect the values of the performance metrics. We will use continuous markov chain analysis that we learned in the last article.

Motivating Scenario

Assume that the counter of a certain fast food store is arrange in a 2-stage way. The customers first place their order in counter 1, pays the cashier and proceeds to counter 2 to wait for their order to be picked up. Let a be the rate at which customers arrive, b the rate at which customers places their order and also the rate at which they leave the second counter. Let us draw the markov chain diagram of this system.

Each state is described by two numbers. The first number refers to the number of customers in the first counter. The second number refers to the number of customers waiting at the second counter. For example, state 00 mean that there are no customers being serviced. State “12″ mean that one customer is in counter 1 and 2 customers in counter 2. The blue arrows represent arrivals at rate a and red arrows are represent the rate of customers going to counter 2 and those leaving the counter 2.

markov1.png

Let us examine the four states in the diagram below. There is a blue arrow from state “00″ to state “10″ because when the first customer arrives, the first station will have 1 customer but no one in line yet in counter 2. This means that when the first customer arrives, the system will be in state “10″. There is an arrow from state “10″ to “01″. The means that the first customer is already in counter 2 to pickup his/her order but no new customer has arrived yet. There is a red arrow from state “01″ to “00″. This means that the single customer has already picked up his/her food and has left leaving the system with no customers at both stations. Notice that there is no arrow from state “10″ to “00″ because the customer who is in counter 1 cannot just leave the system but has to go to counter 2 before leaving.

markov2.png
Balance Equations
Now that we have understood why the arrows point from one state to the next, we now setup the balance equations. At steady state, the sum of flows going to one state must equal the sum of flows going out of that state. By definition, the flow along an arrow is equal to the rate along the arrow multiplied by the probability of the system being in the current state.Let us do the balance equation for state “00″. There is one arrow going to state “00″ and one arrow going out of state “00″. Therefore the flow going to state “00″ is equal to the rate going from state “01″ to “00″ multiplied by the probability of the system being in state “01″ : bP_{01}. This is equal to the flow going out of state “00″ to state “10″: a*P00. Therefore,we have,

\displaystyle a P_{00} = bP_{01}

Let us now compute the balance equation for state “01″. As you can see, there are 2 arrows out of state “01″ and one arrow going into state “01″. The balance equation is therefore:

a P_{01} + b P_{01} = b P_{10}

Doing this for all states, we get the following balance equations:

\displaystyle aP_{02} + bP_{02} = b P_{11} + b P_{03}
\displaystyle b P_{03} = bP_{12}
\displaystyle aP_{00} + bP_{11} = bP_{10} + aP_{10}
\displaystyle aP_{01} + bP_{12} + bP_{20} = bP_{11} + bP_{11} + aP_{11}
\displaystyle bP_{12} + bP_{12} = aP_{02} + bP_{21}
\displaystyle bP_{20} + aP_{20} = aP_{10} + bP_{21}
\displaystyle bP_{21} + bP_{21} = aP_{11} + bP_{30}
\displaystyle b P_{30} = a P_{20}

We add to these equations the condition that the sum of all probabilities is equal to 1:

\displaystyle P_{00} + P_{01} + P_{02} + P_{03} + P_{10} + P_{11} + P_{12} + P_{20} + P_{21} + P_{30} = 1

Solving for the probabilities using elementary algebra, we get the following:

\displaystyle P_{01} =\frac{ab^2}{b^3+2ab^2+3a^2b + 4a^3}
\displaystyle P_{02} =\frac{a^2b}{b^3+2ab^2+3a^2b + 4a^3}
\displaystyle P_{03} =\frac{a^3}{b^3+2ab^2+3a^2b + 4a^3}
\displaystyle P_{10}  =P_{01}
\displaystyle P_{11} =P_{02}
\displaystyle P_{12} =P_{03}
\displaystyle P_{20} =P_{02}
\displaystyle P_{21} =P_{03}
\displaystyle P_{03} =P_{30}
\displaystyle P_{00} =(b/a)P_{01}

A simple Example

Suppose that customers arrive on the average of 1 customer per minute and can place and pay for their order in 2 minutes. After that they wait on the average of 2 minutes at the second counter to take out their order. Using the formula above, we have a = 1 and b=2. We can now solve for the probabilities by substituting these values to the above formula.

\displaystyle P_{01} = \frac{ab^2}{b^3+2ab^2+3a^2b + 4a^3} = \frac{1\cdot 2^2}{1^3+ 2\cdot 1\cdot 2^2 + 3\cdot 1^2\cdot 2 + 4\cdot 1^3}
\displaystyle =\frac{2}{13}

Computing the rest of the values, we get the following table:

P00 4/13
P01,P10 2/13
P02,P11,P20 1/13
P03,P12,P21,P30 1/26

We can compute for the Utilization of the first counter by using it’s idle time. Notice that the first counter does not have anything to do when there are no customers lining in front of it. This happens in states P00, P01, P02 and P03. Notice that these states have the first number equal to 0 (which means: no customer in that counter). Therefore, the utilization is equal to 1 minus the idle time of counter 1, i.e,

\displaystyle U_\text{counter 1} = 1 - P_{00} - P_{01} - P_{02} - P_{03}
\displaystyle =\frac{11}{26}

Using the Utilization law we learned in a previous article, we can compute the throughput of counter 1 to be:

\displaystyle X_\text{counter 1} = \frac{U_\text{counter 1}}{S_\text{counter 1}}
\displaystyle =\frac{11/26}{2}
\displaystyle =\frac{11}{52}

This happens to be the system throughput also since each customer will visit a counter only once per transaction.

Now that we know how to model using markov chains, in the next article, we will analyze other queueing disciplines and compare them according to throughput and response time.

November 15, 2006

Performance Arithmetic 2

by Bobby Corpus

In the previous article, we learned the various performance laws that we can use to do quick and dirty calculations related to performance. In this article, we will learn how to compute the lower bound on response time. Let us start with a little known law.

Suppose you are a manager of a restaurant and you want to know how many people are dining in your restaurant during a certain time period of the day. Suppose you observe that on the average 3 people leave the restaurant every 30 minutes. Furthermore, the average time a customer spends on your restaurant is 1 hour. Let us denote this time as R. We can compute the throughput of the restaurant as

\displaystyle X =3/30
\displaystyle =0.1

person per minute. If the customers will remain for 1 hour (or 60 minutes) on the average, then we can expect to have 0.1 *60 = 6 customers at any time in the restaurant.
This result is very intuitive and is known as Little’s Law:

\displaystyle N = XR

where N is the number of concurrent transactions on the system and R is the response time.

Example 1: A financial institution has a server that computes the financial indicators for the day for all stocks in a certain stock exchange. Suppose there is 5 jobs are always running for this task. If a job is finished for a company, another job is launched to compute the indicators for another company. In other words, there are always 5 jobs doing this task at any given time. If each computation takes about 15 minutes to complete, how may stocks can be processed in a day?

Using little’s law:

\displaystyle N = X_0 * R
\displaystyle X_0 = \frac{N}{R}
\displaystyle = \frac{5}{15 * 60}
\displaystyle = 0.0055

jobs/s

In one day, only \displaystyle 0.055\times 24\times 3600 = 480 stocks can be computed.

In a call center there are 50 operators that each receive a query from a customer. There is a backend database that supports the query of the operator. Let us call the duration of time where the operator receives a call and composes a query to the database as the think time \displaystyle Z. The time that elapses from the time the operator presses the submit button to the time it receives the response from the database as the response time \displaystyle R. We can view the operators as alternating between two states namely “thinking” and “waiting” for response. Let \displaystyle \bar M be the average number of operators in “thinking” state and \bar N be the average number of operators “waiting” for response. Then since an operator can either be “thinking” or “waiting”, we have \bar M + \bar N = M, where M is the number of operators.

Applying little’s law to the operators in “thinking” state, wehave:

\displaystyle \bar M = X_0 Z
and
\displaystyle \bar N = X_0 R

Adding the two equations we get:

\displaystyle \bar M + \bar N = M
\displaystyle = X_0 Z + X_0 R
\displaystyle =X_0 (Z+R)
Solving for R we get

\displaystyle R=\frac{M}{X0} - Z

This is known as the interactive response time Law.

Example 2: Suppose we were to design a system in which the response time is not to exceed 2 secs and we have 50 operators. If the average think time is 5 minutes, what is the throughput of the system?

From the interactive response law we have:
\displaystyle X_0 = \frac{M}{z+R}
\displaystyle =\frac{50}{5*60 + 2}
\displaystyle =0.1655

transactions/s
or 0.1655 * 24 * 3600 = 14304.64 transactions per day.
We now come to an important consequence of thelittle’s law, that of estimating the lower bound of
the response time of a system.

From Little’s Law, the response time is give by

\displaystyle N =X_0 R
\displaystyle R = \frac{N}{X_0}

The lower bound for R occurs when the throughput is maximum, or when

\displaystyle R \ge \frac{N}{1/\text{max}\{D_i\}} = N \times \text{max}\{D_i\}

Example 3: A webserver serves images and other static files in disk1 with a service time of 15msecs and its dynamic content is put it disk2 which is faster with a service time of 5msecs. On a peak period of 3 hours, there is an average of 50 concurrent users and the CPU utilization is observed to be 35%. A transaction visits disk1 3 times on the average and disk2 2 times. The webserver completes 50000 transactions in 3 hours. Compute the lower bound of the response time of this system.
\displaystyle D_\text{disk1} = S_\text{disk1} V_\text{disk1}
\displaystyle = 0.015 \times 3
\displaystyle =0.045
\displaystyle D_\text{disk2} = S_\text{disk2} V_\text{disk2}
\displaystyle = 0.005 \times 2
\displaystyle =0.01

\displaystyle D_\text{cpu} = \frac{U_\text{cpu}}{X_0}
\displaystyle = \frac{0.35}{50000/(3*3600)}
\displaystyle =0.0756

Since the CPU has the greater service demand,it is the system bottleneck. The maximum throughput this system can attain is 1/0.0756 = 13.22 transactions per second and the response time this throughput is 50 * 0.0756 = 3.78 seconds.

November 9, 2006

Performance Arithmetic

by Bobby Corpus

When analyzing performance issues in computer systems, it is always a good thing to have a feel for the overall picture and be able to make back-of-the-envelope calculations. In this article, we are going to familiarize ourselves with the various performance metrics and how they relate to one another. We then apply these concepts to be able to compute the maximum throughput and minimum response time of a given system.
Performance can be viewed from different perspectives. For a manager, the most important metric is the throughput. How many transactions can be completed in an hour. The greater the throughput, the more money for the business. On the other hand, for a customer, the most important metric is the response time. How fast can I get the response? Do i have to wait forever to get the next page?

Throughput is defined as the number of completed transactions per unit of time. This unit of time is usually in “seconds”. For a website, throughput can be the number of requests in a day. If a site can complete one million transactions per day, then the thoughput is 1,000,000/(24*3600), since there are 24 hours in a day and 3600 seconds in an hour. We denote throughput by X and define it as:

X_i = C_i/T

where C_i is the number of completed transactions of resource i and T is the observation time.

A system can be composed of more than one resource connected with one another. The overall system throughput is denoted by X_0 and defined similarly as above:

X_0 = \frac{C_0}{T}

where C_0 is the number of transactions completed by the system.

Throughput depends on how fast a resource can complete a transaction. The time it takes for a server to complete a transaction is called the service time S_i. If a CPU takes on the average 30 msecs to complete a computation, then the service time for that type of transaction is 30 msecs.

In a typical 8 hour day, an employee does not really consume the whole 8 hours working. There are periods where the employee is idle, or is not doing useful work. The number of hours he/she is working on the average is less than 8 hours. If we denote the number of hours busy as B, then the Utilization is defined as

U=B/T,

where T is the total working hours. If in a one hour observation period, the CPU is utilization is 40%, then it was busy 0.40*(3600 s) = 1440 seconds and idle the rest of the time.

There is an interesting relationship between the utilization, service demand, and throughput. This is known as the Utilization Law:

U_i=X_iS_i

for a given resource i. This can be seen easily by dividing the numerator and denominator of the utilization by \displaystyle C_i:

\displaystyle U_i =\frac{B_i}{T}
\displaystyle =\frac{B_i/C_i}{T/C_i}

But B_i/C_i is average time per transaction completed, which is precisely the service time. Therefore

\displaystyle U_i =\frac{S_i}{T/C_i}
\displaystyle =\frac{S_i}{1/X_i}
\displaystyle = X_iS_i

If a website should process 1,000,000 requests in a day, how fast should the CPU process each request in order to maintain a utilization of 15%? Using the utilization law we have,
\displaystyle S_i =U_i/X_i
\displaystyle =15/(1000000/(24*3600))
\displaystyle = 0.01296
\displaystyle = 12.96\text{msec}

Consider a web server that is composed of a CPU and 2 disks. A typical transaction may use the CPU and both disks one or more times before completing. Let us denote the number of times a transaction visits a resource i as V_i. Then the total time a transaction spends on a resource i is V_iS_i where $S_i$ is the service time of resource i. This is called the Service Demand D_i of resource i:

\displaystyle D_i = V_i S_i.

There is an interesting relationship between the number of times a transaction visits a resource i and the system throughput. If V_i is the number of times a transaction visits resource i, then

\displaystyle X_i = V_i X_0
\displaystyle V_i = \frac{X_i}{X_0}
\displaystyle = \frac{C_i/T}{C_0/T}
\displaystyle = \frac{C_i}{C_0}

By definition of the service demand:

\displaystyle D_i =V_i S_i
\displaystyle = \frac{C_i}{C_0}S_i
\displaystyle = \frac{C_iSi/T}{C_0/T}
\displaystyle =\frac{U_i}{X_0}

The above result is called the Service Demand Law:

\displaystyle D_i= \frac{U_i}{X_0}

The service demand is a very important quantity. It enables us the compute the maximum throughput of the system. Since X_0=U_i/D_i, as the utilization increase to 100%, the throughput also increases. However, when U_i=100%, the throughput cannot be increased any further. Therefore, the throughput is bounded above by

X_0\le\frac{1}{D_i}.

Therefore, in a given system, the resource with the highest service demand limits the throughput and is therefore the bottleneck.

Let’s have an example. Suppose that the utilizations of the CPU,disk1 and disk2 are measured and given as 15%, 35% and 30% respectively. Furthermore, suppose that the current throughput is 500,000 transactions per day. Calculate the maximum throughput of this configuration.

The system throughput is X_0=500000/(24*3600)=5.79. The service demand of the CPU is

\displaystyle D_\text{CPU} = \frac{U_\text{CPU}}{X_0}
\displaystyle = \frac{0.15}{5.79}
\displaystyle =0.026

Similar computations for disk1 and disk2 yeild:

\displaystyle D_\text{disk1} = 0.06
\displaystyle D_\text{disk2}=0.052

Since disk1 has the highest service demand, this resource limits the throughput. The maximum throughput this configuration can attain is:

\displaystyle X_0 \le \frac{1}{D_\text{disk1}}
\displaystyle = \frac{1}{0.06}
\displaystyle = 16.67

In one day, the maximum number of transactions is therefore

\displaystyle 16.67\times 24\times 3600=1,440,000

In the next article, we are going to learn the minimum response time this system will attain. It can never do faster than that minimum.

October 25, 2006

Performance 101

by Bobby Corpus

JAVAERO – Performance spells the difference between success and failure of a software
project. Since the internet age, the average attention span of
internet users has been decreasing. A web site that does not deliver
its pages within 10 seconds is doomed to be forgotten by a potential
client.

In this article, we are going to introduce the basics of performance
theory using a simple example.

The Ice Cream Stand

Suppose you are manager of an ice cream stand and you want to hire a
ice cream vendor. Two persons applied for the position namely Oliver
and Gemma. However, since Gemma is more experienced in ice cream
vending, she is asking for a higher pay then Oliver. You discovered
that Gemma can complete a transaction ( take the order, prepare the
ice cream, take payment and make change ) in 20 secs. Here is a table
of the performance and salary of the vendors:

Name Speed Salary
Gemma 20 50
Oliver 30 25

As you can see from the table, the speed of oliver is 30
secs/transaction and is only asking half of what Gemma is
asking.

Market study

As a manager, you have conducted market research and have shown that,
on the average, you can expect that one customer will come per minute.
Since you are not the only ice cream stand in the area, customers are
only willing to go to you when the number of people in line is 3 or
less. Otherwise, they will go the the nearest ice cream stand operated
by Clint.

Which of them will you hire?

Analogy to computer systems.

You can think of this example as having a choice between a faster but
more expensive processor and a slower but more economical
processor. As a manager, your task is to maximize your profit, which
is the difference between extra money earned by having a faster
processor and the cost of having it. A little back of the envelope
calculation will show that Gemma
is 33% faster than Oliver but to employ Gemma would take 100% more
than to employ Oliver. The amount of money made while Gemma is on the
stand minus the her pay is less than the amount of money made while
Oliver operates the stand minus his pay. Therefore, Oliver is the
better choice!

In operating systems theory, the improved throughput with a faster
processor does not offset the extra cost of that processor.

If exactly one customer arrives per minute, on the minute, and Oliver
can service each customer in exactly 30 seconds, then the line will
never grow longer than one person. Therefore, Oliver is the better
choice.

With these assumptions, the throughput using either CPU would be the
same. That is, the CPU is not the bottleneck and it would be best to
select the cheaper and slower CPU.

However, real life is not a simple as this. It is very unlikely that
the amount of time between when one customer walks up and when the
next customer walks in is exactly the same for all customers.

Also it is highly unlikely that the attendant takes exactly the same
amount of time to prepare the ice cream for each customer.

There are variations between the arrival of one customer and the next
and the service time is not the same in all customers. In fact, the
amount of time between customer arrivals is random. What we are doing
is just taking the average of all those random times.

There is a big difference between what is average and what is usual.

Exponential distribution

The exponential distribution is a statistical distribution that does a
good job in describing phenomenon where observed times are shorter
than the average time.

The exponential distribution is given by the expression

exponential_dist_eq.gif

Performance Metrics

Let us define the following:

\lambda = \text{arrival rate}

D=\text{service demand}

\mu=\text{service rate} = 1/D

The throughput is the average number of customers processed per minute. At any minute, there can be zero, one ,two, or three people at the stand. We can think of these as four states of the ice cream stand operation.

ice_cream_state_diagram.jpg

Let P_1 be the probability of the system in state 1, P_2 be the probability of the system in state 2, etc. The quantity \mu P_1 is called the flow of the system from state 1 to state 0, likewise, the quantity \lambda P_0 is the flow of the system out of state 0 to state 1. At steady state, these flows are equals

\mu P_1 = \lambda P_2

This is called the balance equation for State 0. Likewise, the balance equation for State 1 is

\lambda P_0 + \mu P_2 = \lambda P_1 + \mu P_1

Continuing in this manner, we get the balance equation for the entire system:

ice_cream_system_eq.jpg

It only needs elementary algebra to solve for the probabilities:

ice_cream_system_eq_sol.jpg

After substituting all values of the parameters, we get the results tabulated below:

ice_cream_system_results.jpg

The rest follows.

Follow

Get every new post delivered to your Inbox.