Previous Topic: 9.3.2.3 R-Square Ranking ReportNext Topic: 9.5 Component Operation


9.4 Analytic Technique Tutorial


Within the context of the CA MICS Capacity Planner, the term
multivariate regression refers to multivariate linear
regression.  Non-linear techniques of multivariate regression
exist, but a discussion of these techniques goes beyond the
scope of this text.  In data processing, non-linear models
are primarily useful in formulating analytical queuing models
and studying response time as the dependent variable.  If
necessary, you can evaluate non-linear models by changing the
functional form of the independent variables, using a
multivariate analog to the quadratic and cubic regression
models described in Section 7.4.1.

The technique of multivariate regression is identical to that
of univariate linear regression, except that more variables
are involved than the x and y variables (a tutorial on
univariate linear regression is presented in Section 7.4).
In multivariate regression, the hypothesis is that the
dependent variable y is a linear function of several
independent variables: x1, x2, ...., xn.

If you recall the introduction to linear regression in
Section 7.4, you know that the equation for a simple linear
regression model is:

              y= b + m * x                            (Eqn 4)

    where b   is a constant that defines the mean value of
              y when x is zero

           m  is the slope of the line that relates y and x

           y  is the value to be predicted (dependent
              variable)

           x  is the value on which the prediction is based
              (independent variable)

You can expand this equation to be a multiple regression
equation by replacing the m and x terms in the previous
equation by a set of terms as shown in the following
equation:
                            _         _
                         n |           |
              y = b + SUM  |  mj * xj  |              (Eqn 5)
                       j=1 |_         _|


      where b is a constant that defines the mean value of
               y when all xj are zero

            mj is the slope of the line that relates y and xj


            xj is the j-th independent variable

Recall the example shown in Figure 9-1 in which the y and xj
terms of Equation 5 are:

                        x1 - TSOCPUTM      x2 - BATCPUTM

      y  - TOTCPUTM
                        x3 - IMSCPUTM      x4 - CICCPUTM


You may solve multiple regression equations for any number of
x terms, but you must select some subset of these terms to
build models for each of the resource consumption values.
While you can make these selections arbitrarily, stepwise
multiple regression provides an ideal technique for
investigating potential relationships between resource
variables.

Although numerous stepwise multiple regression algorithms are
available, this program uses Goodnight's MAXR algorithm,
which is implemented in the SAS REG procedure using the
stepwise option.

For example, consider the development of a statistical model
that attempts to relate the CPU utilization of a processor to
three potential resource elements named A, B, and C.  In this
example, A, B, and C are the independent (or x) resource
variables, while CPU utilization is the dependent (y)
resource variable.  The name stepwise multiple regression
describes the function of the algorithm.  For our small
example, the algorithm attempts to solve the problem using
three steps:

    o  The algorithm evaluates three linear one-term models
       (that is, one based on A, one on B, and one on C) to
       determine which of the three potential elements had
       the greatest correlation to the resource being
       modeled.  This correlation is judged based on the
       r-squared value for each of the models.  The algorithm
       attempts to maximize r-squared.  For the purposes of
       this example, assume that the B element has the
       highest correlation.

    o  The algorithm would then evaluate two linear two-term
       models based on B and the remaining two elements
       (that is, B and A, and B and C).  Once again, the
       algorithm chooses the best model based on the maximum
       r-squared.  For the purposes of this example, assume
       that the pair of elements that resulted in the maximum
       r-squared value is B and C.

    o  Finally, the algorithm would evaluate one linear
       model based on A, B, and C.

The algorithm stops at any step where the addition of a term
to the model does not result in an increase of the r-squared
value by a certain amount.  Thus, you can employ stepwise
multiple regression to investigate the relationship between
several resource consumption variables, letting the algorithm
choose the most likely combinations and eliminate the least
likely.