Previous Topic: 10.1 Functional DescriptionNext Topic: 10.3 Standard Output


10.2 Usage Guidelines


In addition to analyzing the values of r-squared, F, and p
statistics of the regression model (discussed in Section
10.4), there are three other techniques for evaluating
business element models:

o  Determining when business factors can be excluded from the
   model

o  Calculating confidence limits for the business element
   based forecasts

o  Inspecting the model

EXCLUDING BUSINESS ELEMENTS

In the development of multiple regression models, each of the
independent variables used to create the model represents a
source of potential error.  Because of this, one of the
objectives of most multiple regression studies is to attempt
to identify the smallest set of independent variables that
produces a model that accurately represents the historical
data.

For example, consider the development of a model that relates
the disk EXCPs consumed by a life insurance application to
the number of policies in effect, new policies issued, claims
processed, and the number of cash value loans in effect.
When we used stepwise multiple regression to evaluate these
potential business elements, we produced four models. (In the
example in the Section 10.4, Analytic Technique Tutorial, we
show every potential set of combinations to illustrate how
the algorithm decides the order in which the business
elements are added to the model.  In the following table, we
only show the model that results from each step.) The four
steps of the regression modeling process are shown as
follows:

    # of Variables/
        Step          Business Elements        R**2
      ==========    =======================  ==========
          1            New Policies             0.72

          2            New Policies             0.81
                       Claims

          3            New Policies             0.87
                       Claims
                       Policies in Effect

          4            New Policies             0.89
                       Claims
                       Policies in Effect
                       Cash Value Loans

Since each business element introduces another potential
source of error into the model, you must weigh the
contribution each new term makes to the accuracy of the model
relative to the potential error that it introduces.  Previous
investigators have recommended that, for a new business
element to be included in the model, the addition of the term
should result in an improvement of r-squared of 0.05 (ART78,
SAR79).  Some stepwise regression algorithms allow you to
specify a minimum r-squared improvement parameter.  In these
algorithms, the regression stops when adding another term to
the model does not result in the specified minimum
improvement.

Since such a parameter is not available in the current
version of the SAS REG stepwise option, stepwise attempts to
use every business element you propose as long as a non-zero
increase in the r-squared value results.  However, the
program post processes the output from the stepwise option
to provide you with the ability to specify a minimum
r-squared improvement for each business element added to the
model. The default for this parameter is 0.05.

In the example shown in the table, each of the first three
steps of the regression result in an improvement of r-squared
of at least 0.05.  However, the addition of the cash value
loan term to the model only improves the r-squared value by
0.02.  Because of this, Business Element Forecasting would
develop a forecast based only on new policies, claims, and
policies in effect, unless the parameter was overridden.

CALCULATING CONFIDENCE LIMITS

You can calculate confidence limits for multiple regression
models to measure the statistical significance of the
forecasted values.  The width of the confidence limits is
influenced by three factors:

o  The amount of measurement data collected

o  The variability seen in the measurement data

o  The distance between forecasted values and the measurement
   data

Since a smaller confidence interval implies a better, more
reliable forecast, it is desirable to control these factors
as much as possible.  The amount of data processed is a
function of the cost of data collection, so you should
collect as much as you can afford.  You cannot usually
influence the amount of variability in the measurement data.
The third factor, the distance between the forecast and the
measurement data, provides the best leverage in a business
element based forecast.

In other sections of this guide you worked with forecasts
based on historical data.  The forecast observations estimate
future resource demand.  As more observations are forecast,
the forecasts move progressively farther away from the
measured data.  With time-based forecasting this is
inevitable, since we can never measure the future.

In business element forecasting, this is not necessarily the
case.  The width of the confidence limits depends on the
estimated values of the business elements.  As long as the
estimates of the future business elements fall within the
ranges of the volumes used to build the forecasting model,
you can expect the predictions made by the model to have
relatively small confidence limits.  However, the farther the
estimates of the future business elements are from these
ranges, the wider you can expect the resulting confidence
interval to be.

Thus, if an application is currently being tested with
relatively small volumes of input data, forecasts based on
the relationship of recorded business elements to actual
resource consumption are probably not applicable for
predicting resource consumption when the application is in
production, processing significantly larger volumes (that is,
higher business element values) of input data.

The use of confidence limits assumes that the business
element estimates being used are error-free.  If you are not
able to obtain accurate estimates of the future business, it
may not be worthwhile to attempt to develop a model.

Bowie reported this type of problem in her GUIDE presentation
in 1980 (BOW80).  Bowie developed a multiple regression model
of the resources consumed by a CICS-based automobile
reservation system.  However, the company's business
forecasting group did not attempt to develop firm estimates
of future rentals but, instead, continually adjusted the size
of the rental car fleet to meet demand.  Because of this,
Bowie reported that her business element based forecasts of
future resource consumption for the application were no
better than simple regression models developed with smoothed
historical resource consumption data.


INSPECTING THE MODEL

Perhaps the most important step in evaluating business
element-based forecasting is the inspection of the model.
Regardless of how well the proposed model fits the historical
data, the most important question that can be asked is: "Does
this model seem reasonable?"

Viewed from a statistical level, many business element models
may look like an excellent representation of the application,
but these models may also fail to satisfy your intuitive
understanding of the system.  In the case study in Section
10.6, we present an example of this type of model.  Our
example relates the CPU hours consumed by the accounts
receivable application to invoices, updates, and open items.

The statistical tests, using the r-squared and the F
statistic, show that the model is statistically valid.
However, inspection of the results reveals that two of the
three model coefficients are negative and that the intercept
value is greater than any of the historical observations.
Therefore, the model predicts that if no input data is
supplied to the application during a month, 544 CPU minutes
would be consumed.  Moreover, as the input volumes increase,
you can be expect the resources consumed by the application
to decrease.  This model does not make sense as a forecasting
tool.

Although you might initially expect the intercept value to be
zero, you should expect a positive value, since there are a
number of functions (like database backups, control card
processing, etc.) that consume resources every time the
application runs, regardless of the input volumes.  There are
several reasons why quirks may sometimes occur when you apply
the business element methodology.  A few common examples are
listed below:

o  Duplicate or erroneous SMF data might be used to determine
   the monthly resource consumption values.  You can expect
   the CA MICS data validation process to eliminate such
   errors from models built using the CA MICS database.

o  During low volume weeks there might be extensive reruns of
   the application.  Since CA MICS carries ABEND counts, you
   can identify these types of problems.

o  The application may be in a state of change.  If a more
   efficient program is implemented during the period in
   which the data is being collected, you might expect to see
   a drop in resources, even though volumes are increased.

o  The business elements that are used to develop the model
   may not be related to computer resource use, even though
   they are directly related to the volume of work processed
   by the application.  This problem occurs when there are
   other business elements (not considered in the model)
   which have a greater effect on the application.

There are any number of other factors that may influence the
modeling process.  By inspecting the model, you can identify
such problems before errant forecasts are produced.