

3. PERFORMANCE ANALYSIS TOOLS › 3.3 Data Clustering Analysis › 3.3.3 Standard Output › 3.3.3.2 Cluster Performance Summary Report
3.3.3.2 Cluster Performance Summary Report
A sample Cluster Performance Summary report is shown below in
Figure 3-34.
Data Clustering Analysis 1
Cluster Performance Summary
For: Monday, June 23, yyyy
Clustering Execution Options
Cluster Input Data: SAMPLE
Clustering Method: RADIUS
Maximum Cluster Radius: 3.0
Training Sample Size: 2000
Sample Trim Limit: 97.5 Pct.
Sampling Percentage : 2.5 Pct.
Include Outliers in Clusters: YES
Include Sparse Clusters: YES
Sparse Cluster Limit: 0.5 Pct.
Report Cluster Population: YES
Report Using Account Codes: YES
Report Sparse Clusters : YES
Report Outliers Separately: YES
Create Cluster Index Graph: YES
Create Population Graph: YES
Input CA MICS Files: BATJOB
Input Dataset Name: 'COGDA01.AUDIT.BATJOB'
_____________________________________________________________________________________
Cluster Feature Contents
Feature --------Outlier Limits-------
Name ------------- Description -------------- Lower Upper Cases
________ ________________________________________ ____________ ____________ ______
JOBTCBTM Job TCB CPU Time 0:00:00.00 0:01:05.70 50
JOBEDASD DASD EXCPS 0.00 38,933.00 50
======
100
Note: A given case may be classified as an outlier more than once if multiple
analysis elements contain abnormal values. Therefore, the totals in this report
section will not necessarily agree with those of the Population section
_____________________________________________________________________________________
Data Clustering Analysis 2
Cluster Performance Summary
For: Monday, June 23, yyyy
Cluster Population Summary
Cluster Radius Normal Outlying Total % of Clustering
Obs. Obs. Obs. Population Index
_______ ______ ______ ________ _______ __________ __________
1 0.00 0 14 14 0.70 0.00
2* 0.82 0 2 2 0.10 0.82
3* 0.82 0 6 6 0.30 0.51
4* 1.17 0 2 2 0.10 1.17
5* 1.20 0 2 2 0.10 1.20
6* 1.39 4 0 4 0.20 1.01
7* 1.43 0 3 3 0.15 1.02
8* 1.50 0 2 2 0.10 1.50
9* 1.55 0 5 5 0.25 1.22
10 1.61 1,541 0 1,541 77.05 0.34
11 1.71 15 0 15 0.75 0.77
12* 1.86 0 9 9 0.45 1.26
13* 2.01 0 2 2 0.10 2.01
14 2.04 4 8 12 0.60 1.19
15 2.07 129 0 129 6.45 0.95
16 2.22 12 0 12 0.60 0.80
17 2.25 0 11 11 0.55 1.69
18* 2.29 0 2 2 0.10 2.29
19* 2.45 0 2 2 0.10 2.45
20* 2.54 0 3 3 0.15 1.99
21* 2.70 0 2 2 0.10 2.70
22* 2.71 2 6 8 0.40 1.68
23 2.72 39 0 39 1.95 1.70
24 2.83 173 0 173 8.65 0.98
====== ====== ======= ======
1,919 81 2,000 100.00
'*' Indicates that the cluster is sparsely populated.
Sparse cluster are defined as having a population that is less than 0.5 %
of the total population. In this study, the sparse cluster population limit is
10 cases.
_____________________________________________________________________________________
Figure 3-34. Cluster Performance Summary report
The Cluster Performance Summary report is presented in three
sections as described next:
The first section explains the execution options chosen by
the analyst for the study being performed and serves to
document which reports, graphs will be produced.
For an explanation of the executions options and their
respective usage, see section 15.x of the product guide.
The second section of the report presents the feature
contents of the clusters, including the limits used to
determine outliers and the number of times (cases)
observations were declared as outliers because of a given
feature value.
FEATURE The name of the data element chosen for
NAME: clustering. This is normally a CA MICS data
element but can be a computed (user defined)
element if required.
DESCRIPTION: The SAS label of the selected data element,
from either the CA MICS GENLIB definition of
supplied by the user.
OUTLIER Statistical bounds to determine if a given data
LIMITS: value is considered "normal" or represents an
abnormal condition. These values are computed
based on the Sample Trim Limit value specified
on the Clustering Execution Options screen.
LOWER: Cases (observations) whose value for this
feature are less than this value are defined as
outliers. Given that most performance
measurement data is positive in scope, the
current implementation uses a value of zero for
this boundary.
HIGHER: Cases (observations) whose value for this
feature are greater than this value are defined
as outliers.
CASES: The number of cases (observations) that were
declared as outliers because of this feature
value. Note that a given case may be flagged
for multiple times and therefore the number of
cases presented in this section may exceed the
total number of Outlying observations in the
next report section.
A footnote to this affect is printed at the
bottom of this report section.
The third report section presents a summary of the size,
population and general performance of each cluster.
CLUSTER The cluster number. Cluster numbers are
NUMBER: assigned sequentially to the patterns that are
identified by the algorithm. Note that the
order in which the clusters are identified is
not an indicator of merit.
RADIUS: The geometric distance from the cluster center
centroid to the outer boundary of the cluster,
expressed in terms of Standard Deviations. You
define the outer limit of this value by
supplying the Maximum Cluster Radius value on
the Clustering Execution Options screen.
NORMAL The count of observations within this cluster
OBS: where all feature values were found to be
"normal" in a statistical sense. In this
implementation, normal feature values are those
that are less then the value determined by the
Sample Trim Limit. For example, if the Sample
Trim Limit of 97.5%, then all feature values of
"normal" clusters would reside below the 97.5
percentile of the sample.
OUTLYING The count of observations within this cluster
OBS: where one or more feature values were found to
be "outliers" in a statistical sense. In this
implementation, outlying feature values are
those that are greater then the value
determined by the Sample Trim Limit. For
example, if the Sample Trim Limit of 97.5%,
then at least one feature value of "outlying"
clusters would reside above the 97.5 percentile
of the sample.
TOTAL OBS: The sum of NORMAL and OUTLYING observations.
% OF The percent of the total sample populations
POPULATION: represented by this cluster.
CLUSTERING Formally called the Performance Index, this
INDEX: metric was renamed in this implementation to
avoid confusion with similar terms in the z/OS
Workload Manager. It is the Root Mean Square
of the distances of all cases (observations)
within the cluster and serves as a simple
measure of clustering effectiveness. The lower
this value, the tighter the fit of the cluster
data. This is not to say that outliers are not
present; only that they are not distorting the
cluster shape by their presence.
Note the footnote at the bottom of this report section. It
refers to "sparse" clusters and presents the definition and
limits used for determining which clusters are considered
"sparse". Sparse clusters are generally populated by
outliers and are often dropped from further analysis after
they have been reviewed.
Copyright © 2014 CA.
All rights reserved.
 
|
|