Previous Topic: 3.3.3.1 Clustering Input Descriptive Statistics ReportNext Topic: 3.3.3.3 Cluster Population Summary Report


3.3.3.2 Cluster Performance Summary Report


A sample Cluster Performance Summary report is shown below in
Figure 3-34.

                              Data Clustering Analysis                    1
                            Cluster Performance Summary
                             For: Monday, June 23, yyyy

                           Clustering Execution Options

                    Cluster Input Data: SAMPLE
                     Clustering Method: RADIUS
                Maximum Cluster Radius:  3.0

                  Training Sample Size:   2000
                     Sample Trim Limit: 97.5 Pct.
                  Sampling Percentage :  2.5 Pct.

          Include Outliers in Clusters: YES
               Include Sparse Clusters: YES
                  Sparse Cluster Limit:  0.5 Pct.

             Report Cluster Population: YES
            Report Using Account Codes: YES
               Report Sparse Clusters : YES
            Report Outliers Separately: YES
            Create Cluster Index Graph: YES
               Create Population Graph: YES

                   Input CA MICS Files: BATJOB
                    Input Dataset Name: 'COGDA01.AUDIT.BATJOB'
_____________________________________________________________________________________

                             Cluster Feature Contents
 Feature                                                --------Outlier Limits-------
   Name    ------------- Description --------------     Lower        Upper      Cases
 ________  ________________________________________  ____________ ____________ ______
JOBTCBTM   Job TCB CPU Time                            0:00:00.00   0:01:05.70     50
JOBEDASD   DASD EXCPS                                        0.00    38,933.00     50
                                                                               ======
                                                                                  100
 Note: A given case may be classified as an outlier more than once if multiple
 analysis elements contain abnormal values. Therefore, the totals in this report
 section will not necessarily agree with those of the Population section
_____________________________________________________________________________________

                              Data Clustering Analysis                    2
                            Cluster Performance Summary
                             For: Monday, June 23, yyyy

                            Cluster Population Summary

Cluster     Radius     Normal     Outlying        Total         % of       Clustering
                        Obs.        Obs.           Obs.      Population       Index
_______     ______     ______     ________      _______      __________    __________
      1       0.00          0         14             14            0.70         0.00
      2*      0.82          0          2              2            0.10         0.82
      3*      0.82          0          6              6            0.30         0.51
      4*      1.17          0          2              2            0.10         1.17
      5*      1.20          0          2              2            0.10         1.20
      6*      1.39          4          0              4            0.20         1.01
      7*      1.43          0          3              3            0.15         1.02
      8*      1.50          0          2              2            0.10         1.50
      9*      1.55          0          5              5            0.25         1.22
     10       1.61      1,541          0          1,541           77.05         0.34
     11       1.71         15          0             15            0.75         0.77
     12*      1.86          0          9              9            0.45         1.26
     13*      2.01          0          2              2            0.10         2.01
     14       2.04          4          8             12            0.60         1.19
     15       2.07        129          0            129            6.45         0.95
     16       2.22         12          0             12            0.60         0.80
     17       2.25          0         11             11            0.55         1.69
     18*      2.29          0          2              2            0.10         2.29
     19*      2.45          0          2              2            0.10         2.45
     20*      2.54          0          3              3            0.15         1.99
     21*      2.70          0          2              2            0.10         2.70
     22*      2.71          2          6              8            0.40         1.68
     23       2.72         39          0             39            1.95         1.70
     24       2.83        173          0            173            8.65         0.98
                       ======     ======        =======          ======
                        1,919         81          2,000          100.00

'*' Indicates that the cluster is sparsely populated.

 Sparse cluster are defined as having a population that is less than 0.5 %
 of the total population. In this study, the sparse cluster population limit is
 10 cases.
_____________________________________________________________________________________

 Figure 3-34.  Cluster Performance Summary report

The Cluster Performance Summary report is presented in three
sections as described next:

The first section explains the execution options chosen by
the analyst for the study being performed and serves to
document which reports, graphs will be produced.

For an explanation of the executions options and their
respective usage, see section 15.x of the product guide.

The second section of the report presents the feature
contents of the clusters, including the limits used to
determine outliers and the number of times (cases)
observations were declared as outliers because of a given
feature value.

FEATURE       The name of the data element chosen for
NAME:         clustering.  This is normally a CA MICS data
              element but can be a computed (user defined)
              element if required.

DESCRIPTION:  The SAS label of the selected data element,
              from either the CA MICS GENLIB definition of
              supplied by the user.

OUTLIER       Statistical bounds to determine if a given data
LIMITS:       value is considered "normal" or represents an
              abnormal condition.  These values are computed
              based on the Sample Trim Limit value specified
              on the Clustering Execution Options screen.

LOWER:        Cases (observations) whose value for this
              feature are less than this value are defined as
              outliers.  Given that most performance
              measurement data is positive in scope, the
              current implementation uses a value of zero for
              this boundary.

HIGHER:       Cases (observations) whose value for this
              feature are greater than this value are defined
              as outliers.

CASES:        The number of cases (observations) that were
              declared as outliers because of this feature
              value.  Note that a given case may be flagged
              for multiple times and therefore the number of
              cases presented in this section may exceed the
              total number of Outlying observations in the
              next report section.

              A footnote to this affect is printed at the
              bottom of this report section.


The third report section presents a summary of the size,
population and general performance of each cluster.

CLUSTER       The cluster number.  Cluster numbers are
NUMBER:       assigned sequentially to the patterns that are
              identified by the algorithm.  Note that the
              order in which the clusters are identified is
              not an indicator of merit.

RADIUS:       The geometric distance from the cluster center
              centroid to the outer boundary of the cluster,
              expressed in terms of Standard Deviations.  You
              define the outer limit of this value by
              supplying the Maximum Cluster Radius value on
              the Clustering Execution Options screen.

NORMAL        The count of observations within this cluster
OBS:          where all feature values were found to be
              "normal" in a statistical sense.  In this
              implementation, normal feature values are those
              that are less then the value determined by the
              Sample Trim Limit.  For example, if the Sample
              Trim Limit of 97.5%, then all feature values of
              "normal" clusters would reside below the 97.5
              percentile of the sample.

OUTLYING      The count of observations within this cluster
OBS:          where one or more feature values were found to
              be "outliers" in a statistical sense.  In this
              implementation, outlying feature values are
              those that are greater then the value
              determined by the Sample Trim Limit.  For
              example, if the Sample Trim Limit of 97.5%,
              then at least one feature value of "outlying"
              clusters would reside above the 97.5 percentile
              of the sample.

TOTAL OBS:    The sum of NORMAL and OUTLYING observations.

% OF          The percent of the total sample populations
POPULATION:   represented by this cluster.

CLUSTERING    Formally called the Performance Index, this
INDEX:        metric was renamed in this implementation to
              avoid confusion with similar terms in the z/OS
              Workload Manager.  It is the Root Mean Square
              of the distances of all cases (observations)
              within the cluster and serves as a simple
              measure of clustering effectiveness.  The lower
              this value, the tighter the fit of the cluster
              data.  This is not to say that outliers are not
              present; only that they are not distorting the
              cluster shape by their presence.

Note the footnote at the bottom of this report section.  It
refers to "sparse" clusters and presents the definition and
limits used for determining which clusters are considered
"sparse".  Sparse clusters are generally populated by
outliers and are often dropped from further analysis after
they have been reviewed.