2.4.5.2 Technique Tutorial

2. PERFORMANCE REPORTING ANALYSIS › 2.4 I/O Configuration Analysis › 2.4.5 DASD Skew Analysis Inquiry › 2.4.5.2 Technique Tutorial
2.4.5.2 Technique Tutorial


DASD skew is a significant problem because it allows one busy
device on a string to adversely affect the performance of the
remainder of the devices on the string.  To understand the
nature of the problem, it is first necessary to understand
the concepts of rotational position sensing (RPS) devices and
RPS miss probabilities.

With the introduction of the 3330 device in the early 1970s,
IBM changed the channel protocol for disks from a selector-
like channel protocol, where the channel was left busy for
much of the time while the disk arm moved (seek) and while
the disk read to the desired physical record, to a rotational
position sensing protocol.  Under the RPS protocol, the
device disconnects from the channel while it positions itself
to read the desired record.  Since data transfer represented
only a small percentage of the time that pre-RPS channels
were busy, there was opportunity for significant performance
improvements by allowing the channel to service other
requests while the device moved to the desired record.

In theory, the device senses (via a value provided in a set
sector channel control word (CCW)) that the head is about to
encounter the desired record, and reconnects to the channel.
Unfortunately, if the channel is busy when the device
encounters the record, an RPS miss occurs and the device must
wait a full rotation (16.7 milliseconds for most device
types) until the head encounters the record again.  It is
easy to conclude that even an occasional RPS miss could
significantly degrade the performance of a device, since an
RPS miss introduces a delay equal to about one half of the
average device service time for 3330, 3350, and 3380 device
types.

Fortunately, the probability of an RPS miss is a simple
function of the utilization of the channel that supports the
device.  You are probably familiar with the rule of thumb
that says that block multiplexor channels should never be
more than 30 percent busy.  This rule of thumb can easily be
understood by considering the equation for RPS miss
probabilities.  Though its derivation is beyond the scope of
this tutorial, an understanding of the rule can be achieved
by considering the following equation:

            u p = --------- (Equation 7)
          (1-u)

    where: p - is the probability of an RPS miss.

           u - is the utilization of the path, i.e., channel.

The behavior of this equation is best demonstrated by
considering the following table of path utilizations that
vary from 5 to 50 percent.  Consider:

                     u     |     p
               ------------|------------
                     5     |    0.05
                    10     |    0.11
                    15     |    0.18
                    20     |    0.25
                    25     |    0.33
                    30     |    0.43
                    35     |    0.54
                    40     |    0.67
                    45     |    0.82
                    50     |    1.00
                           |

This equation is an example of an exponential relationship.
That is, very small changes in the independent variable u can
have profound effects on the dependent variable p.

Now that you understand RPS reconnect probabilities, we can
discuss the problems that are introduced by DASD skew.  DASD
skew describes a condition where one or two of the devices on
a string are very heavily utilized.  For example, consider
the following table of eight disk devices and the percent of
the path (i.e., channel) used by each of the devices.

                  DEVICE   | % PATH (u)
               ------------|------------
                     1     |      3
                     2     |     12
                     3     |      1
                     4     |      0
                     5     |      2
                     6     |      0
                     7     |      3
                     8     |      1
                           |

To simplify the calculation of the RPS miss probabilities, we
assume that the string of devices is on a dedicated channel.
The probability of an RPS miss for device 2 is dependent on
the total utilization of the path by the other devices on the
string.  (After thinking for a moment, it is clear that a
device cannot contend with itself for the channel.  It is
this insight that is key to the understanding the problem
caused by DASD skew.)  Therefore, the probability of an RPS
miss for device 2 is a function of the sum of the path
utilizations of the other devices, i.e., 3+1+0+2+0+3+1=10%.
Thus, the RPS miss probability is:

                      .1
              p = ---------- = 0.11
                    (1-.1)

The impact of DASD skew can be most easily seen by
calculating the RPS miss probability for a lightly loaded
device like device 8.  The probability of an RPS miss for
device 8 is a function of the sum of the path utilizations of
the other devices, i.e., 3+12+1+0+2+0+3=21%.  Thus, the RPS
miss probability is:

                      .21
              p = ---------- = 0.27
                    (1-.21)

From this simple exercise, we can conclude that the ideal
string would have the path utilizations of all of its devices
approximately equal.

One question that this argument introduces in an MVS/370
environment is, how can we calculate the utilization of the
path for each device?  The path utilization is calculated by
multiplying the average service time of the logical channel
that serves the device by the SIO rate for the device.

Although logical channel service time is not reported by
standard RMF reports, it can be approximated easily for
MVS/370 by using the service times of the physical channels
that comprise the logical channel.  For example, consider
logical channel 7 that is comprised of physical channels 4
and 8.  The average logical channel service time is computed
by taking the weighted average (weighted by the number of
SIOs processed by each of the physical channels) of the
average physical channel service times for channels 4 and 8.

In MVS/XA this is not necessary, since the device connect
time is reported by the hardware and the average value is
maintained in CA MICS data element DVAAVCNN.

Once the path utilizations of the devices on a string are
known, statistical techniques can be applied to the values to
statistically quantify the skewness of the devices on the
string.  There are two statistical coefficients that indicate
the skewness of a distribution:

    o The coefficient of variation, CV
    o The coefficient of skewness, SK

The coefficient of variation is computed from the mean and
standard deviation of the distribution of the device path
utilization values.  The coefficient of variation is computed
as:

         standard deviation
    CV = ------------------                      (Equation 8)
               mean

The larger the value of the coefficient of variation, the
more significantly skewed the distribution.

The second statistical technique for evaluating the skewness
of a distribution is the coefficient of skewness.  The
coefficient of skewness is computed from the mean, median,
and standard deviation.  The coefficient of skewness is
computed as:

         3 * (mean - median)
    SK = -------------------                     (Equation 9)
         standard  deviation

The most important consideration when computing these
coefficients is whether or not idle devices should be
included in the computations.  Clearly, a string with seven
idle devices and one busy device will have very high
coefficients of variation and skewness.  However, there is no
RPS miss problem, since the other devices are unused.
Therefore, the algorithms implemented in the I/O Analysis
routines exclude idle devices from the calculation of the
coefficients.

Of the two coefficients for evaluating the statistical
behavior of the devices on the string, the coefficient of
skewness is probably a more reliable indicator.  Also, you
should note that the skewness calculations are made on a
system-by-system basis, i.e., the utilization of the control
unit and head of string (both parts of the I/O path) by other
systems is not accounted for by the current implementation of
the algorithm.
Tell Technical Publications how we can improve this information