Queuing Theory as it relates to storage I/O

Alan Johnson

Queuing Theory is closely related to statistics in that there is an element of uncertainty between the various elements such as the arrival frequency. In disk storage terms the customer is the I/O request and the disk, is the server. Queuing Theory is a particularly complex area where other factors such as random arrival times and service priorities need to be considered. There will always be a tradeoff between the number of servers and the cost of implementing those servers. The designer must balance the wait time for customers against the wait time for servers. Servers should tend towards 100% utilization but this should not result in customers waiting excessively long times for service.

Servicing disk I/O requests is an area where queuing theory can be readily applied. I/O requests form a queue that must be serviced by the storage sub-system. When requests are not serviced fast enough, subsequent requests will have to wait for service. Clearly, the goal is to reduce wait times to as small a figure that is practically possible. Factors, which determine how individual requests are serviced, relate to:

  1. The number of servers
  2. The queuing time
  3. The service time
  4. Distribution (by time) of requests

     

Queuing Theory models

The manner in which customers are served is termed the queue discipline. A common discipline is the First Come First Served (FCFS) discipline. Queuing systems can be as simple as Single Server/Single Queue as shown in Figure 1, Multiple Server/Multiple Queue shown in Figure 2 or Multiple Server/Single Queue shown in Figure 3.

The latter is being adopted by many banks and airlines and gives better utilization and load balancing.

Figure 1 Single Server/Single Queue

Figure 2 Multiple Server/Multiple Queue

Figure 3 Multiple Server/Single Queue

Most real life examples deal with a random arrival of requests. This is modeled as a Poisson Probability Distribution. With this distribution the probability of x arrivals within a given time frame is calculated as P(x) = (Eq.1) where x = 0,1,2..n. For the case with P(0) the equation is simply .

Service times are defined as the time from when the request has been placed until the time that it has been satisfied. An Exponential Probability Distribution provides a good approximation for I/O storage systems running transaction-processing applications.

Queuing Theory variables

Important parameters used within single server queuing systems are the mean arrival rate (l) and the mean service rate (m). It is important that the service rate is greater than the arrival rate otherwise the queue will keep growing without limit. The average number of customers in the queue is Lq and is described by the equation (Eq.2).

The average number of customers in the system is described by (Eq.3).

The length of time spent in the queue is given by Wq and is described by the equation Wq = (Eq.4).

The average wait time in the system is given by (Eq.5).

A summary is given in Table 1.

Table 1 Queuing Theory variables for Single queue single server systems.

Mean arrival rate l
Mean service rate m
Probability of x arrivals within a given period P(x) =
Average number of customers in the queue
Average number of customers in the system
Average length of time spent in the queue Wq =
Average wait time in the system

To take an example of a database application, which generated, on average, I/O read requests (queries) at the rate of 50 per second being sent to an I/O storage subsystem with the capability of servicing 75 requests per second.

  • The average queue length of I/O requests is 1.33.
  • The average number of I/O requests outstanding would be 2.0.
  • The average time waiting in the queue would be 1.33/50 = 0.0267 seconds
  • = 27 milliseconds.
  • The average wait time in the system is .0267+1/75 = 40 milliseconds.

With this application the I/O sub-system can cope but it is showing signs of strain. Firstly, a wait time of 40 milliseconds is long and if more users are added to the system generating an extra ten more requests per second, the system will demonstrate significant delays. Figure 4 shows how the queue builds up as the arrival rate increases. Clearly, the System Administrator needs to address this issue quite soon.

Now, if the administrator substituted a more powerful storage system with the capability of servicing 150 requests per second (m=150), we obtain values of:

  • Lq = 0.5
  • L = 1
  • Wq = 6.6 milliseconds
  • W = 13.3 milliseconds

Figure 4 Queue length with μ = 75

Plotting these values again with μ =150, Figure 5 shows that the system can easily service 100 requests per second.

Figure 5 Queue length with μ = 150

However, providing a faster server is only one way of addressing the issue, an alternative method is to implement multiple servers similar to that shown in Figure 2 and Figure 3. The section “Queuing Theory as applied to RAID systems” on page 6 discusses multiple servers.

I/O Patterns

Typically within a computer system, applications exhibit either High bandwidth/low I/O request rate or Low bandwidth/high I/O request rate characteristics.

An application with a low request rate is normally a sequential application involving large data transfers. In this situation, the arrival rate l will be quite low. Typical applications, which fall into this category, are Imaging and Video.

Transaction processing applications exhibit a high frequency of requests for small quantities of data, hence l will be high. Queuing Theory models allow us to predict how applications perform.

Queuing Theory – practical applications

A RAID controller provides a good example where queuing theory is applicable. The RAID controller is typically connected to multiple disk channels and one or more host channels. To analyze the system it is convenient to think of it as two separate entities, one side is from the controller to the host and the other side is from the controller to the drive channels. In the case of read requests from the host’s perspective, the controller is the server and the Host Bus Adapter (HBA) is the client, of course, multiple adapters may be present on the host side but over a single interface, this translates into a busier queue. As far as the RAID controller is concerned, the drive channels are serving the data and the RAID controller requests are the consumers.

Figure 6 RAID System with multiple drive interfaces

Multiple drive interfaces versus single host interface

Providing multiple drive channels allows better utilization of the RAID controller. Of course, each channel will have multiple disks attached but here we will treat each channel as being the lowest level. In a host read situation, the controller (customer) will provide a stream of I/O requests, which are directed, to multiple channels (servers). In this instance, the model is similar to that shown in Figure 3. A number of assumptions will be made with the model in that channels are similar in terms of capability.

Relevant equations for the multiple server model are shown in Table 2.

Table 2 Queuing Theory variables for Single queue multiple server systems.

Mean arrival rate l
Mean service rate m
Number of channels k
Probability that there are no customers in the system. P(0) = []Eq. 6
Average number of customers in the queue . (Eq. 7)
Average number of customers in the system
Average length of time spent in the queue Wq =
Average wait time in the system

To take a specific example, assume that there is an arrival rate of 100 I/O requests per second and that this is serviced by a single channel with a service rate of 105. Using equation 2 on page 3 : Lq = 10000/525 = 19.

The average wait time in the queue (Wq) is 19/105 = 181 milliseconds. This is an area of concern for an administrator, spreading the load over multiple channels will ease the situation, but if there is a cost tradeoff then it is necessary to see how performance increases as the channels are added. One strategy is to apply the model by adding channels until the performance is within acceptable limits.

Firstly applying the same system with two service channels each capable of servicing 105 requests per second, using equation 7:

From equation 7 à     Lq = 9524/12100 P0 = 0.787 P0.

From equation 6à     P0 = 0.3548

Lq = (0.787) (0.3548) = 0.279

Wq = 0.279/100 = 0.0028.

By adding a second channel the queue length has been reduced from 19 to less than one and the wait time has been reduced from 181 milliseconds to 2.8 milliseconds.

To take another example, the strategy employed is to replace the server with one that is twice as fast i.e. m = 210.

Using equation 2 (from the single server model) Lq = 1002 / (210)(110) = 0.433. The wait time is now 4.3 milliseconds. In this example, a significant difference in performance is obtained by using multiple channels or increasing the performance of the server The equations used in queuing theory are non linear and can exhibit counter intuitive results.

The single server model described in this chapter is known as M/M/1 and the multiple server model is known as M/M/c. Even though only these two queuing models are looked at with some rather simple assumptions, they are reasonably representative of a typical random access I/O pattern. Examples that are more complex would involve priorities and finite queues.

Comments and suggestions for future articles welcome!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s