

9. MULTIVARIATE REGRESSION FORECASTING › 9.4 Analytic Technique Tutorial
9.4 Analytic Technique Tutorial
Within the context of the CA MICS Capacity Planner, the term
multivariate regression refers to multivariate linear
regression. Non-linear techniques of multivariate regression
exist, but a discussion of these techniques goes beyond the
scope of this text. In data processing, non-linear models
are primarily useful in formulating analytical queuing models
and studying response time as the dependent variable. If
necessary, you can evaluate non-linear models by changing the
functional form of the independent variables, using a
multivariate analog to the quadratic and cubic regression
models described in Section 7.4.1.
The technique of multivariate regression is identical to that
of univariate linear regression, except that more variables
are involved than the x and y variables (a tutorial on
univariate linear regression is presented in Section 7.4).
In multivariate regression, the hypothesis is that the
dependent variable y is a linear function of several
independent variables: x1, x2, ...., xn.
If you recall the introduction to linear regression in
Section 7.4, you know that the equation for a simple linear
regression model is:
y= b + m * x (Eqn 4)
where b is a constant that defines the mean value of
y when x is zero
m is the slope of the line that relates y and x
y is the value to be predicted (dependent
variable)
x is the value on which the prediction is based
(independent variable)
You can expand this equation to be a multiple regression
equation by replacing the m and x terms in the previous
equation by a set of terms as shown in the following
equation:
_ _
n | |
y = b + SUM | mj * xj | (Eqn 5)
j=1 |_ _|
where b is a constant that defines the mean value of
y when all xj are zero
mj is the slope of the line that relates y and xj
xj is the j-th independent variable
Recall the example shown in Figure 9-1 in which the y and xj
terms of Equation 5 are:
x1 - TSOCPUTM x2 - BATCPUTM
y - TOTCPUTM
x3 - IMSCPUTM x4 - CICCPUTM
You may solve multiple regression equations for any number of
x terms, but you must select some subset of these terms to
build models for each of the resource consumption values.
While you can make these selections arbitrarily, stepwise
multiple regression provides an ideal technique for
investigating potential relationships between resource
variables.
Although numerous stepwise multiple regression algorithms are
available, this program uses Goodnight's MAXR algorithm,
which is implemented in the SAS REG procedure using the
stepwise option.
For example, consider the development of a statistical model
that attempts to relate the CPU utilization of a processor to
three potential resource elements named A, B, and C. In this
example, A, B, and C are the independent (or x) resource
variables, while CPU utilization is the dependent (y)
resource variable. The name stepwise multiple regression
describes the function of the algorithm. For our small
example, the algorithm attempts to solve the problem using
three steps:
o The algorithm evaluates three linear one-term models
(that is, one based on A, one on B, and one on C) to
determine which of the three potential elements had
the greatest correlation to the resource being
modeled. This correlation is judged based on the
r-squared value for each of the models. The algorithm
attempts to maximize r-squared. For the purposes of
this example, assume that the B element has the
highest correlation.
o The algorithm would then evaluate two linear two-term
models based on B and the remaining two elements
(that is, B and A, and B and C). Once again, the
algorithm chooses the best model based on the maximum
r-squared. For the purposes of this example, assume
that the pair of elements that resulted in the maximum
r-squared value is B and C.
o Finally, the algorithm would evaluate one linear
model based on A, B, and C.
The algorithm stops at any step where the addition of a term
to the model does not result in an increase of the r-squared
value by a certain amount. Thus, you can employ stepwise
multiple regression to investigate the relationship between
several resource consumption variables, letting the algorithm
choose the most likely combinations and eliminate the least
likely.
Copyright © 2014 CA.
All rights reserved.
 
|
|