Previous Topic: 7.4.2.2 Data Smoothing Using a Geometric Moving AverageNext Topic: 7.4.2.4 User-Defined Data Transformations


7.4.2.3 Data Smoothing Using a LOG10 Function


Fitting trend lines to service-oriented measurements is
another problem that you may encounter.  While trend-based
estimates of future service levels may be inferior to the
results that you can obtain using analytic queuing models,
you can often fit regression models to service-oriented
measurements to obtain a sense of direction for the values.
Consider the following table of total pages per second
collected for a 3090-200J processor:


                       Observation      Page
             Month       Number         /Sec
            =======    ===========     ======
             JAN98          1             75
             FEB98          2             77
             MAR98          3             80
             APR98          4             86
             MAY98          5             97
             JUN98          6            113


Figure 7-6 shows a scatter plot of the data.  A linear
regression model that was developed for this historical data
series has the following parameters:

    n  =      6, the number of historical observations

    b  =   83.1, the y intercept

    m  =   1.41, the slope of the line

     2
    r  =   0.17, the coefficient of determination

    F  =   29.3, the F value

    p  =   0.01, the probability that we should reject the
                 hypothesis

    s  =   14.8, the standard error
     e

The predicted and residual values for the historical data
series are shown in the following table:

             Observation      Page       Est.      Residual
   Month       Number         /Sec       Pages     (error)
  =======    ===========     ======     ======     ========
   JAN98          1             75       84.5         9.5
   FEB98          2             77       85.9         8.9
   MAR98          3             80       87.3         7.3
   APR98          4             86       88.7         2.7
   MAY98          5             97       90.1        -6.9
   JUN98          6            113       91.5       -21.5

The model poorly represents the historical data collected for
paging.  This is indicated by the small value of r-squared.
To understand why this model fails, you must consider the
underlying mechanism that produced the observations.  Paging
is an example of a service-oriented measurement that results
from an underlying queuing relationship.  That is, rather
than expecting a linear increase in paging to a linear
increase in load, you can expect an exponential increase.  A
standard technique (KEL74) for modeling exponential data is
to use a logarithmic function to transform the data into a
linear form.  Univariate Model Forecasting employs a log base
10 function, as shown in the following equation:

    x(j) = LOG10(x(j)+1.0), for all j                (Eqn 11)

Note that 1.0 is added to the historical observation before
the transformation to avoid the undefined result of the LOG10
function at zero.  To interpret the forecasted values, we
must reverse the transformation performed in Equation 11.
The following equation shows this reverse transformation:

              x(j)
    x(j) = 10     - 1.0, for all j                   (Eqn 12)

The 1.0 that is subtracted in Equation 12 corresponds to the
1.0 that was added in Equation 11.  A LOG10 function was
applied to the historical paging data presented in the
previous table to "linearize" the observations. The result of
this transformation is shown in the following table:


                       Observation      LOG10
             Month       Number         Pages
            =======    ===========     ======
             JAN98          1           1.88
             FEB98          2           1.89
             MAR98          3           1.90
             APR98          4           1.93
             MAY98          5           1.99
             JUN98          6           2.05

Using the smoothed observations a second linear model was
developed.  The parameters of this model are shown below:

    n  =      6, the number of historical observations

    b  =   1.84, the y intercept

    m  =   0.03, the slope of the line

     2
    r  =   0.96, the coefficient of determination

    F  =  31.82, the F value

    p  =   .005, the probability that you should reject the
                 hypothesis

    s  =  -0.06, the standard error.  (Note that you must
     e           transform this value using Equation 11 to
                 obtain the actual standard error,
                 10**-.06 = 0.88.)

The future observations predicted by the model must be
transformed back from the logarithmic scale in the same
manner as the standard error value (described above).  The
predicted and residual values developed from model are shown
in the following table:


                                     Est
            Obs    Page     LOG10   LOG10   Trans   Residual
   Month     #     /Sec     Page    Pages    Est    (error)
  =======   ===   ======   ======   =====   =====   ========
   JAN98     1      75       1.88    1.87    74.1      0.9
   FEB98     2      77       1.89    1.90    79.4     -2.4
   MAR98     3      80       1.90    1.93    86.1     -6.1
   APR98     4      86       1.93    1.96    91.2     -5.2
   MAY98     5      97       1.99    1.99    97.7     -0.7
   JUN98     6     113       2.05    2.02   104.7      8.3


Forecasts developed using the transformed paging observations
are superior to the forecast based on the original values.
You should consider logarithmic transformation any time you
are modeling service-related values like device utilizations,
turnaround times, or response times.

PAGING DATA | * | | 110 + | | | | 105 + | | | | P 100 + A | G | E | * S | 95 + / | | S | E | C 90 + | | | | * 85 + | | | | 80 + * | | | * | 75 + * | ---+------------------+------------------+------------------+------------------+------------------+-- 1 2 3 4 5 6 OBSERVATION NUMBER


 Figure 7-6.  Monthly Page/Sec Values