

3. PERFORMANCE ANALYSIS TOOLS › 3.3 Data Clustering Analysis › 3.3.4 Analytic Technique Tutorial › 3.3.4.2 Scaling
3.3.4.2 Scaling
One problem that is introduced by the use of statistical
clustering algorithms is scaling. In the clustering example
discussed in the previous section, a similar numerical range
for the X and Y variables was selected. Unfortunately,
actual workload data presents a much wider range of values.
Consider the problem of developing job classes. The job
classes in this example are based on CPU minutes and print
lines. The maximum value observed for CPU minutes would be
far less than 1,000 minutes, while the likelihood of printing
more than 1,000 lines is very high. If the differences in
these features are squared in the geometric distance
equation, the number of CPU minutes used by a job would
appear to be insignificant. Therefore, the variables must be
scaled prior to the clustering process so that relative
differences between the features have a similar influence on
the geometric distance calculation.
Neugents technology for clustering operations uses a simple
unit scaling technique described next.
UNIT SCALING
Unit scaling is a simple technique for solving the problems
introduced by the different range of values associated with
each of the resource vector's features. Most simply stated,
unit scaling maps the range of values associated with each
feature into the range 0 to 1 (AGR76). Equation 3 details
this mapping.
x - x
i min
s = --------------- (Eqn 3)
i x - x
max min
where: s is the scaled value of x
i i
x is i-th observation of the x feature
i
x is the minimum of all x
min i
x is the maximum of all x
max i
A modified version of UNIT scaling is currently used within
Neugents technology whereby data values are scaled internally
to a range of -1 to +1.
The potentially poor representation of outliers is one of the
primary concerns in the selection of features for the
clustering process. Criteria for selecting features for
clustering are discussed in Section 3.3.2.4.
Copyright © 2014 CA.
All rights reserved.
 
|
|