Previous Topic: 9.2.1 Stepwise Multiple Regression ModelsNext Topic: 9.2.3 Evaluating the Model


9.2.2 Choosing Model Parameters


The first step in the development of a multivariate
regression model is to determine the appropriate model
parameters (that is, the dependent and independent variables
to be evaluated).

Not all possible combinations of independent and dependent
resource variables within the capacity planning database
provide usable or even sensible models in the capacity
planning process.  Exploring every possibility in searching
for the right models would be an enormously time-consuming
and time-wasting enterprise.  And even though the technique
of stepwise multiple regression is useful in eliminating
certain models, there are still too many combinations to
evaluate.  Throughout the modeling and forecasting process,
your own knowledge and experience are the best possible
guidelines for finding the correct applications for
Multivariate Regression Forecasting and for discarding
inappropriate ones.

Certain applications will never provide suitable models for
Multivariate Regression Forecasting because they do not fit
such a model on the basis of purely theoretical reasons.  For
example, a model that consists of the response time as the
dependent variable, and the transaction rate and transaction
service time (and possibly other variables) as the
independent variables do not usually provide a suitable
multivariate regression model.  This is because analytic
queuing theory demonstrates that the most appropriate models
for these variables are non-linear.

On the other hand, certain processor overhead functions can
provide reasonable multivariate regression models.  Even
though the hypothetically correct model may not necessarily
be linear, it may be close enough to allow very good
approximation by a linear model.

The following are some examples of situations where
multivariate regression can provide reasonable models:

    o  Uncaptured processor utilization.  This is the amount
       of processor busy time used for unrecorded overhead
       functions.  This example is discussed in more detail
       in Section 9.6.1.

    o  IMS Control Region overhead analysis.  You can
       correlate the amount of time used by an IMS control
       region to the amount of processor time used by its
       dependent regions.  This example is discussed in more
       detail in Section 9.6.2.

    o  System address spaces.  Certain system address spaces
       and tasks provide services for all users of the
       system.  Examples of these services include the master
       scheduler, JES, VTAM, TCAS, monitors, global enqueue
       propagators, and others.  You can frequently correlate
       use of these systems to processor use of the other
       non-system address spaces, such as batch and TSO
       address spaces.  Such a model could be helpful, since
       future resource consumption estimates for the batch
       and/or TSO address spaces might then be used to
       produce future estimates of resource consumption by
       this group of system address spaces.

    o  Recorded processor utilization.  You can generally
       relate processor use to the average number of batch,
       TSO, and started tasks that are running on the
       processor.  This application of Multivariate
       Regression Forecasting enables you to predict future
       processor use on the basis of the anticipated number
       of users.

When you choose the applications for Multivariate Regression
Forecasting analysis, ensure that the desired resource
variables are present in capacity planning database files.
All independent variables in the model must exist within a
single capacity planning database file.   The dependent
variable may exist in the same or a different capacity
planning database file.

In addition, you must ensure that the capacity planning data
base file containing the dependent variable and the file
containing the independent variables are of the same
time-span (either DAYS, WEEKS, or MONTHS), and that they both
cover a concurrent period of historical data.