

10. BUSINESS ELEMENT FORECASTING › 10.4 Analytic Technique Tutorial › 10.4.2 Stepwise Multiple Regression Concepts
10.4.2 Stepwise Multiple Regression Concepts
The stepwise multiple regression procedures used by Business
Element Forecasting are an extension of the linear regression
models presented in Chapters 6 and 7. In the introduction to
linear regression provided in Section 6.4.1, we presented the
equation for a simple linear regression model as:
y= b + m * x (Eqn 1)
where b is a constant that defines the mean value of
y when x is zero
m is the slope of the line that relates y and x
y is the value to be predicted
x is the value on which the prediction is based
You can expand this simple equation to be a multiple
regression equation by replacing the m and x terms in the
previous equation by a set of terms as shown in the following
equation:
_ _
n | |
y = b + SUM | m * x | (Eqn 2)
j=1 |_ j j _|
where b is a constant that defines the mean value of
y when x is zero
m is the slope of the line that relates y and x
j j
x is the j-th business element
j
Although you can easily solve multiple regression equations
of this type for any set of x terms (that is, business
elements), you are often presented with numerous potential
business elements by your application's users. Therefore,
you must select some subset of these business factors to
build models for each of your application's resource
consumption values. While you could make these selections
arbitrarily, stepwise multiple regression provides an ideal
technique for investigating potential relationships between
business elements and resource consumption.
Although numerous stepwise multiple regression algorithms are
available, this program uses Goodnight's MAXR algorithm,
which is implemented in the SAS REG procedure with the
stepwise option. For example, consider the development of a
statistical model that attempts to relate the CPU consumption
of an application to three potential business elements called
A, B, and C. The name, stepwise multiple regression,
describes the function of the algorithm. For our example,
the algorithm would attempt to solve the problem using three
steps:
1. The algorithm would evaluate three linear one-term
models (that is, one based on A, one on B, and one on C)
to determine which of the three business elements had
the greatest correlation to the resource being modeled.
This correlation is judged based on the r-squared value
for each of the models. The algorithm attempts to
maximize r-squared. For the purposes of this example,
assume that the B business element had the highest
correlation.
2. The algorithm would evaluate two linear two-term models
based on B and the remaining two business elements (that
is, B and A, and B and C). Once again, the algorithm
chooses the best model based on the maximum r-squared.
For the purposes of this example, assume that the pair
of business elements that resulted in the maximum
r-squared value was B and C.
3. The algorithm would evaluate one linear model based on
A, B, and C.
The algorithm stops at the step in which it finds the best
r-square value. After this point, adding a term to the model
does not result in an increase of the r-squared value. Thus,
you can employ stepwise multiple regression to investigate
the relationship between application resource consumption and
a potential business element. Although a discussion of
Goodnight's MAXR algorithm is far beyond the scope of this
document, we can illustrate the function of the algorithm by
considering the following example.
This procedure was used to predict the CPU and EXCP resource
requirements of an accounts receivable application (SAR79).
The application is characterized by monthly volumes of
invoices, updates, and open items which were processed by the
installation. In this example the model to be investigated
is:
y - BILCPUTM x - UPDATES
2
x - INVOICES x - OPN_ITEM
1 3
A stepwise multiple regression is used to evaluate a model of
CPU time consumption. First, the software evaluates three
models of CPU time consumption based on invoices, updates,
and open items. The results of these models are shown in the
following table:
Model Business Intercept
Number Elements Coef. CPU Mins R**2 F p
====== ========== ======= ========= ====== ===== =====
1 Invoices -0.011 520.2 0.72 12.7 0.02
2 Updates 0.030 180.2 0.36 2.9 0.15
3 Open Items -0.003 419.0 0.26 1.8 0.24
The coefficient and intercept terms shown in the table
correspond to the m and b values discussed in Equation 1.
The values of r-squared, F, and p are interpreted below:
o r-squared is the coefficient of determination. This value
indicates what percent of the variability in the
historical data is explained by the proposed model.
Typically, we are seeking r-squared values greater than or
equal to 0.7.
o F is a measure of statistical quality calculated from the
ratio of sums of squares terms for the models. In
general, larger F values indicate more reliable models.
Note that the F value has very little significance when
less than 30 historical data points are used to develop
the model.
o p is the probability that the F is significant. A value
less than or equal to 0.01 is recommended for computer
measurement applications (SAR79). Note that you can
discount the p value if you use only a few historical
observations to build the model.
Since the maximum value of r-squared resulted from a model
based on invoices, you would select it as the best one-term
model that could be developed from the proposed business
elements.
The second step of the process is evaluating potential
two-term models that you develop from the one-term model
developed above. The possible models in this case are based
on invoices and updates, and invoices and open items. The
results for these models are shown in the following table:
Model Business Intercept
Number Elements Coef. CPU Mins R**2 F p
====== ========== ======= ========= ====== ===== =====
1 Invoices -0.010 448.5 0.77 6.8 0.05
Updates 0.013
2 Invoices -0.010 611.7 0.84 10.6 0.03
Open Items -0.002
Since the maximum value of r-squared resulted from a model
based on invoices and open items, it would be selected as the
best two-term model that could be selected from the proposed
business elements.
The final step of the process is evaluating a three-term
model based on all of the business elements supplied by the
user. This model is shown in the following table.
Model Business Intercept
Number Elements Coef. CPU Mins R**2 F p
====== ========== ======= ========= ====== ===== =====
1 Invoices -0.009 544.1 0.88 7.6 0.07
Updates 0.012
Open Items -0.002
Since the maximum value of r-squared resulted from a model
based on invoices, updates, and open items, you would select
the best model that you could develop from the proposed
business elements.
Copyright © 2014 CA.
All rights reserved.
 
|
|