Previous Topic: 7.4.2.1 Deleting Errant ObservationsNext Topic: 7.4.2.3 Data Smoothing Using a LOG10 Function


7.4.2.2 Data Smoothing Using a Geometric Moving Average


One problem that analysts often encounter when attempting to
forecast computer requirements is the high degree of
variability shown by the observations in an historical data
series.  These variations are typically caused by random
elements (for example, test jobs) that often exist in a
system's workload.  The following table shows the monthly
observations of test jobs that are processed by a moderately
sized MVS system, illustrates this problem.


                       Observation      Test
             Month       Number         Jobs
            =======    ===========     ======
             JAN98          1           2900
             FEB98          2           3070
             MAR98          3           2950
             APR98          4           3080
             MAY98          5           3200
             JUN98          6           3150


Figure 7-5 shows a scatter plot of the data.  A linear
regression model developed for this historical data series
has the following parameters:

    n  =      6, the number of historical observations

    b  =   2881, the y intercept

    m  =   50.6, the slope of the line

     2
    r  =   0.68, the coefficient of determination

    F  =   8.47, the F value

    p  =   0.04, the probability that we should reject the
                 hypothesis

    s  =   72.6, the standard error
     e

The predicted and residual values for the historical data
series are shown in the following table:

             Observation      Test       Est.      Residual
   Month       Number         Jobs       Jobs      (error)
  =======    ===========     ======     ======     ========
   JAN98          1           2900       2932         -32
   FEB98          2           3070       2982          88
   MAR98          3           2950       3033         -88
   APR98          4           3080       3084          -4
   MAY98          5           3200       3134          66
   JUN98          6           3150       3185          35

Although one could argue that the model produced is
marginally acceptable, only 68% of the variability in the
historical data is accounted for by the model.  You can treat
this type of apparent randomness in historical data through
the use of data smoothing techniques.  Although a wide
variety of techniques are available, perhaps the simplest is
the geometric moving average (BAR77).  A geometric moving
average (GMA) is attractive in that you only need to
sacrifice one historical data observation to calculate the
smoothed series.  (Other techniques require you to sacrifice
many more historical data observations.  For example, a
five-point moving average requires you to sacrifice the first
five observations.)  You can calculate a geometric moving
average using the following equation.


    x(j) = alpha * x(j) + beta * x(j-1),
           for all j>=2 (Eqn 10)

           where alpha + beta = 1.0

Thus, you can use the first and second observation to
calculate a new value for the second observation, and so
forth.  Another feature of the geometric moving average is
that you can select the degree of smoothing.  If you select a
large value for alpha (that is, 0.5 <= alpha < 1.0), the
smoothed series is less sensitive to variations between
observations.  Conversely, if you select a small value for
alpha (that is, 0.0 < alpha < 0.5), the smoothed series is
more responsive to variations in the historical data series.

For example, you can apply a geometric moving average to the
historical data observations shown in the previous table.
For this example, alpha equals 0.5.  Hence, beta is equal to
0.5.  The observation for January would be lost and the
observation for February would be computed as

    FEB98 = 0.5 * 2900 + 0.5 * 3070 = 2985

The value for March would be computed based on the smoothed
observation for February and the actual value for March.
This procedure would be continued for the remainder of the
observations, resulting in the following table:


                       Observation    GMA Test
             Month       Number         Jobs
            =======    ===========     ======
             JAN98          1             .
             FEB98          2           2985
             MAR98          3           2968
             APR98          4           3024
             MAY98          5           3112
             JUN98          6           3131


Using the smoothed observations a second linear model was
developed. The parameters for this model are shown below:

    n  =      5, the number of historical observations

    b  =   2869, the y intercept

    m  =   43.6, the slope of the line

     2
    r  =   0.88, the coefficient of determination

    F  =  23.36, the F value

    p  =   0.02, the probability that we should reject the
                 hypothesis

    s  =   29.6, the standard error
     e

The following table shows the predicted and residual values
developed using this model:


             Obs   GMA Test    Test      Est.     Residual
   Month      #      Jobs      Jobs      Jobs     (error)
  =======    ===    ======    ======    ======    ========
   JAN98      1        .       2900      2914         .
   FEB98      2      2985      3070      2957         28
   MAR98      3      2968      2950      3000        -32
   APR98      4      3024      3080      3043        -19
   MAR98      5      3112      3200      3087         25
   JUN98      6      3131      3150      3131          0


The forecast developed using the smoothed data series is much
better behaved than the first forecast that was based on the
untreated historical data series.  Data smoothing is a
powerful technique that you can use to minimize the effects
of apparently random variations in historical data.

JOB COUNTS | | | | | 3200 + * 3190 + 3180 + 3170 + 3160 + 3150 + * 3140 + 3130 + 3120 + 3110 + 3100 + 3090 + 3080 + * J 3070 + * O 3060 + B 3050 + S 3040 + 3030 + 3020 + 3010 + 3000 + 2990 + 2980 + 2970 + 2960 + 2950 + * 2940 + 2930 + 2920 + 2910 + 2900 + * | | | | ---+------------------+------------------+------------------+------------------+------------------+-- 1 2 3 4 5 6 OBSERVATION NUMBER


 Figure 7-5.  Monthly Job Counts