Previous Topic: 10.4.1 Developing Business Element ModelsNext Topic: 10.5 Component Operation


10.4.2 Stepwise Multiple Regression Concepts


The stepwise multiple regression procedures used by Business
Element Forecasting are an extension of the linear regression
models presented in Chapters 6 and 7.  In the introduction to
linear regression provided in Section 6.4.1, we presented the
equation for a simple linear regression model as:

              y= b + m * x                            (Eqn 1)

    where  b  is a constant that defines the mean value of
              y when x is zero

           m  is the slope of the line that relates y and x

           y  is the value to be predicted

           x  is the value on which the prediction is based

You can expand this simple equation to be a multiple
regression equation by replacing the m and x terms in the
previous equation by a set of terms as shown in the following
equation:
                            _         _
                         n |           |
              y = b + SUM  |  m  * x   |              (Eqn 2)
                       j=1 |_  j    j _|


     where  b  is a constant that defines the mean value of
               y when x is zero

            m  is the slope of the line that relates y and x
             j                                              j

            x  is the j-th business element
             j

Although you can easily solve multiple regression equations
of this type for any set of x terms (that is, business
elements), you are often presented with numerous potential
business elements by your application's users.  Therefore,
you must select some subset of these business factors to
build models for each of your application's resource
consumption values.  While you could make these selections
arbitrarily, stepwise multiple regression provides an ideal
technique for investigating potential relationships between
business elements and resource consumption.

Although numerous stepwise multiple regression algorithms are
available, this program uses Goodnight's MAXR algorithm,
which is implemented in the SAS REG procedure with the
stepwise option. For example, consider the development of a
statistical model that attempts to relate the CPU consumption
of an application to three potential business elements called
A, B, and C.  The name, stepwise multiple regression,
describes the function of the algorithm.  For our example,
the algorithm would attempt to solve the problem using three
steps:

 1.  The algorithm would evaluate three linear one-term
     models (that is, one based on A, one on B, and one on C)
     to determine which of the three business elements had
     the greatest correlation to the resource being modeled.
     This correlation is judged based on the r-squared value
     for each of the models.  The algorithm attempts to
     maximize r-squared.  For the purposes of this example,
     assume that the B business element had the highest
     correlation.

 2.  The algorithm would evaluate two linear two-term models
     based on B and the remaining two business elements (that
     is, B and A, and B and C). Once again, the algorithm
     chooses the best model based on the maximum r-squared.
     For the purposes of this example, assume that the pair
     of business elements that resulted in the maximum
     r-squared value was B and C.

 3.  The algorithm would evaluate one linear model based on
     A, B, and C.

The algorithm stops at the step in which it finds the best
r-square value.  After this point, adding a term to the model
does not result in an increase of the r-squared value.  Thus,
you can employ stepwise multiple regression to investigate
the relationship between application resource consumption and
a potential business element.  Although a discussion of
Goodnight's MAXR algorithm is far beyond the scope of this
document, we can illustrate the function of the algorithm by
considering the following example.

This procedure was used to predict the CPU and EXCP resource
requirements of an accounts receivable application (SAR79).
The application is characterized by monthly volumes of
invoices, updates, and open items which were processed by the
installation.  In this example the model to be investigated
is:

     y  - BILCPUTM            x  - UPDATES
                               2

     x  - INVOICES            x  - OPN_ITEM
      1                        3

A stepwise multiple regression is used to evaluate a model of
CPU time consumption.  First, the software evaluates three
models of CPU time consumption based on invoices, updates,
and open items.  The results of these models are shown in the
following table:


 Model    Business            Intercept
 Number   Elements    Coef.   CPU  Mins   R**2     F      p
 ======  ==========  =======  =========  ======  =====  =====
    1    Invoices     -0.011    520.2     0.72    12.7   0.02

    2    Updates       0.030    180.2     0.36     2.9   0.15

    3    Open Items   -0.003    419.0     0.26     1.8   0.24

The coefficient and intercept terms shown in the table
correspond to the m and b values discussed in Equation 1.
The values of r-squared, F, and p are interpreted below:

o  r-squared is the coefficient of determination.  This value
   indicates what percent of the variability in the
   historical data is explained by the proposed model.
   Typically, we are seeking r-squared values greater than or
   equal to 0.7.

o  F is a measure of statistical quality calculated from the
   ratio of sums of squares terms for the models.  In
   general, larger F values indicate more reliable models.
   Note that the F value has very little significance when
   less than 30 historical data points are used to develop
   the model.

o  p is the probability that the F is significant.  A value
   less than or equal to 0.01 is recommended for computer
   measurement applications (SAR79).  Note that you can
   discount the p value if you use only a few historical
   observations to build the model.

Since the maximum value of r-squared resulted from a model
based on invoices, you would select it as the best one-term
model that could be developed from the proposed business
elements.

The second step of the process is evaluating potential
two-term models that you develop from the one-term model
developed above.  The possible models in this case are based
on invoices and updates, and invoices and open items.  The
results for these models are shown in the following table:


 Model    Business            Intercept
 Number   Elements    Coef.   CPU  Mins   R**2     F      p
 ======  ==========  =======  =========  ======  =====  =====
    1    Invoices     -0.010    448.5     0.77     6.8   0.05
         Updates       0.013

    2    Invoices     -0.010    611.7     0.84    10.6   0.03
         Open Items   -0.002

Since the maximum value of r-squared resulted from a model
based on invoices and open items, it would be selected as the
best two-term model that could be selected from the proposed
business elements.

The final step of the process is evaluating a three-term
model based on all of the business elements supplied by the
user.  This model is shown in the following table.


 Model    Business            Intercept
 Number   Elements    Coef.   CPU  Mins   R**2     F      p
 ======  ==========  =======  =========  ======  =====  =====
    1    Invoices    -0.009     544.1     0.88     7.6   0.07
         Updates      0.012
         Open Items  -0.002

Since the maximum value of r-squared resulted from a model
based on invoices, updates, and open items, you would select
the best model that you could develop from the proposed
business elements.