

9. MULTIVARIATE REGRESSION FORECASTING › 9.2 Usage Guidelines › 9.2.1 Stepwise Multiple Regression Models
9.2.1 Stepwise Multiple Regression Models
Within the context of the CA MICS Capacity Planning
component, the term "multivariate regression" refers to
multivariate linear regression.
A multivariate regression model is a linear model which
explains one variable, known as the dependent variable, in
terms of a linear combination of several other variables,
known as independent variables. The general form of the
model can be written as follows:
y = b + m1 * x1 + m2 * x2 + m3 * x3 + ... mj * xj (Eqn 1)
In this form the dependent variable, denoted by y, is shown
as a combination of a constant term, b, added to a group of j
independent variables, denoted by x1, x2, ... xj.
Each of the x variables is multiplied by an m coefficient
which represents the directional slope of a regression
surface in a j-dimensional space. When only one independent
variable is used, the equation for the model is:
y = b + m1 * x1 (Eqn 2)
This is the equation for a univariate regression model, which
is described in Section 7.4. The m coefficient is the slope
of a line in a two-dimensional space, defined by a y-axis and
an x-axis.
If the model is extended to two independent variables, it
expands to:
y = b + m1 * x1 + m2 * x2 (Eqn 3)
This is a three-dimensional space, defined by a y-axis, an
x1-axis, and an x2-axis. The m1 and m2 coefficients
represent the slopes of a plane, or flat surface, within the
space. It becomes very difficult to visualize the physical
appearance of the regression surface for models with more
than two independent variables.
In the case study presented in Section 9.6.1, we construct a
multivariate regression model which relates total CPU time to
the TCB CPU time used by key system workloads. For this
model the y and xj terms are:
x1 - TSOCPUTM x2 - BATCPUTM
y - TOTCPUTM
x3 - IMSCPUTM x4 - CICCPUTM
To use Multivariate Regression Forecasting, you must supply
observed values of the dependent and independent variables.
The procedure generates estimates of m1, m2, ... mj, and
various test statistics, such as r-squared, F, and p, which
are the multivariate analogs to the statistics for univariate
modeling described in Section 7.4.
In a general multivariate model, you determine the number of
independent variables, assuming that you know what the
independent variables should be and how many there are. At
the start of an exploratory analysis, however, you do not
know which variables are related, and how many variables
should be analyzed.
To solve this problem, Multivariate Regression Forecasting
uses what is known as a stepwise process to determine which
independent variables have the most predictive power on the
dependent variable. This process, which is based on the SAS
REG procedure with the stepwise option, attempts to find
which subset of a group of variables forms the best
predictive model. For example, you may supply ten variable
names. The process chooses the best predictor, then the
second best, then the third best, and so on, by trying out
each of the ten variables individually and retaining the
best. Once the first is known, the process tries this one
with each of the remaining nine to select the best two-
element model. The process continues the testing, finding
the best three-element model, four-element model, and so on.
It stops when it finds no more variables to test or no
greater statistical quality in the models.
Copyright © 2014 CA.
All rights reserved.
 
|
|