

9. MULTIVARIATE REGRESSION FORECASTING › 9.2 Usage Guidelines › 9.2.3 Evaluating the Model
9.2.3 Evaluating the Model
The second step in developing a Multivariate Regression
Forecasting model is evaluating how well the model fits the
data. For this purpose, you can use the same basic
statistics that you use in univariate linear regression with
multivariate regression analysis. The most commonly used
statistics are the r-squared, F, and p statistics. The
statistics for Multivariate Regression Forecasting and
univariate linear regression are similar, although the
formula for calculating multivariate regression statistics is
more involved.
As we discussed in Chapter 7, r-squared, F, and p are
interpreted as follows:
o r-squared is the coefficient of determination (also
referred to as the coefficient of multiple
correlation). This value indicates what percent of
the variability in the historical data is explained
by the proposed model. You would typically be
seeking r-squared values greater than or equal to
0.7.
o F is a parameter calculated from the ratio of sums of
squares terms for the models. In general, larger F
values indicate more reliable models. However, the F
value has very little significance when less than 30
historical data points are used to develop the model.
o p is the probability that the F value is significant.
A value less than or equal to 0.01 has been
recommended for computer measurement applications
(SAR79). You can discount the p value if you use a
small number of historical observations to build the
model.
There are countless numbers of tests and statistics that aid
the sophisticated statistician in the evaluation of models.
However, there are three basic tests that you can employ.
Used in concert, these tests should enable you to evaluate
how well the models fit the data relatively quickly and with
relatively little prerequisite mathematical knowledge.
TEST 1 - Examine the r-squared, F, and p statistics.
We recommend an r-squared value greater than 0.7. The
p statistic should be small (we recommend a value of
0.01 or less). Like most rules of thumb, these rules
cannot be considered inviolate.
The F value should also be large, although the size is
relative and depends upon, among other things, the
number of independent variables. The F-value is used
to calculate the p statistic, so it is probably better
to rely upon the p statistic until you gain familiarity
with models and their statistics.
The r-squared, F, and p values are printed on the Model
Analysis Report.
TEST 2 - Visually examine the data.
Visual examination of the actual and predicted data is
very important and can sometimes show that a model
whose r-square, F, and p statistics look bad may be
significantly improved by the exclusion of a small
number of outliers from the data.
The Multivariate Residual Analysis Report can help you
visually examine the actual, predicted, and residual
values. Ideally the residuals should appear totally
random. If you notice a pattern in the residuals, you
may want to reconsider your choice of independent
variables.
TEST 3 - Apply the common sense test.
The common sense test asks, "Does this model make any
sense?" To illustrate this point, consider the
following example. You decide to analyze a model whose
dependent variable is CPU utilization, measured in CPU
hours per zone per week. The three independent
variables are the average number of started tasks
(STC), the average number of batch jobs (BAT), and the
average number of TSO users (TSO) during the week.
Suppose that a stepwise regression shows that the best
r-squared model is
CPU_HRS = (1.5*TSO) + (.89*BAT) + (-.23*STC) + 2.45
For this model, the r-squared value is 0.82, the
F-value is 323.26, and the p statistic is .0012.
Based on these values, the model passes TEST 1 with
flying colors. Furthermore, since visual analysis of
the actual and predicted data indicates that there
appear to be no serious outliers, you can consider TEST
2 a success.
However, this model fails TEST 3 because the
coefficient for the STC variable is negative. If this
model were truly valid, then you would want to run as
many started tasks as possible on this processor, for
each additional running started task reduces CPU
utilization by 0.23 hours of CPU time per week. Such a
notion violates common sense, so we must discard this
model.
Generally, situations like the above (in which TESTs 1
and 2 pass and TEST 3 fails miserably) most often occur
when there is insufficient historical data included in
the evaluation of the model. This situation can also
occur if there is duplication among the variables you
use in the analysis. For example, predicting total CPU
utilization based on batch job, TSO session, and IMS
transaction CPU times can yield negative coefficients
if the IMS message region batch jobs are not excluded
from the historical data. In summary, always apply
TEST 3.
ATYPICAL BEHAVIOR
When TEST 3 fails, look for atypical behavior on the system
during the measurement interval. For example, this model
shows a negative relationship between started tasks and
overall CPU time. Perhaps the system programmers were
testing some new online systems, and they were allowed to
test only during non-peak periods. Because of this, they
started several online systems whenever the load on the
system dropped off. In this case, you could isolate the test
online systems as a separate independent variable, and repeat
the analysis.
Not all problems with data are the result of atypical
workloads. A more serious problem which causes TEST 3 to
fail is a violation of the basic assumptions of regression.
In the theoretical formation of a regression model, we assume
that the independent variables are truly independent of each
other. This assumption is often violated by computer system
data. In our example we use the number of TSO users, batch
jobs, and started tasks. If TSO is used for development and
test, you would expect that increased TSO use would lead to
increased batch usage as the programmers submit more jobs.
This violation of the regression assumptions, where the
independent variables are correlated with each other, is
known as multicolinearity. The only way to recover from data
which represents multicolinearity is to change the data to
prevent the problem. If the TSO and batch activity are
correlated, you must remove the correlation by creating a new
variable which represents batch jobs not due to TSO. This
variable is independent of TSO, so you may repeat this
analysis with your basic independence assumption satisfied.
Copyright © 2014 CA.
All rights reserved.
 
|
|