Previous Topic: 3.3.3.2 Cluster Performance Summary ReportNext Topic: 3.3.3.4 Data Value Exception Detail Report


3.3.3.3 Cluster Population Summary Report


A sample Cluster Population Summary report is shown below in
Figure 3-35.

Data Clustering Analysis 1 Cluster Population Summary For: Tuesday, June 24, yyyy _________________________________________________ Cluster: 1 ____________________________________________________ Radius: 0.00 Clustering Index: 0.00 Maximum Cluster Radius: 3.0 STDs Normal Obs.: 0 0.00% Observations << 97.5 Pct. Outlying Obs.: 14 0.70% Observations >> 97.5 Pct. Total Obs.: 14 0.70% Cluster Feature Resources Feature Description Minimum Average Standard Maximum Total % of Value Value Deviation Value Value Total JOBTCBTM Job TCB CPU Time 25 656 687 2,400 9,186 34.96 JOBEDASD DASD EXCPS 5,103 236,581 298,042 1,154,523 3,312,131 27.64 Cluster Report Resources Feature Description Total % of Value Total JOBNLR Total Logical Writer Records 0 0.00 __________________________________________________ Cluster: 10 ____________________________________________________ Radius: 1.61 Clustering Index: 0.34 Maximum Cluster Radius: 3.0 STDs Normal Obs.: 1,541 77.05% Observations << 97.5 Pct. Outlying Obs.: 0 0.00% Observations >> 97.5 Pct. Total Obs.: 1,541 77.05% Cluster Feature Resources Feature Description Minimum Average Standard Maximum Total % of Value Value Deviation Value Value Total JOBTCBTM Job TCB CPU Time 0 1 2 14 1,873 7.13 JOBEDASD DASD EXCPS 1 914 1,155 6,523 1,408,821 11.76 Cluster Report Resources Feature Description Total % of Value Total JOBNLR Total Logical Writer Records 19,968 99.56 Sparse Clusters Have Been Excluded _______________________________________________________________________________________________________________________


 Figure 3-35. Cluster Population Summary Report
                                               Data Clustering Analysis
                                              Cluster Population Summary

                                              For: Monday, June 23, yyyy

The following clusters were determined to be 'sparse' in
their population and have been excluded from processing.

  Cluster:       Population

     2             2
     3             6
     4             2
     5             2
     6             4
     7             3
     8             2
     9             5
    12             9
    13             2
    18             2
    19             2
    20             3
    21             2
    22             8


CLUSTER       Cluster numbers are assigned sequentially
NUMBER:       to the patterns that are identified by the
              algorithm.  Note that the order in which the
              clusters are identified is not an indicator of
              merit.

RADIUS:       The geometric distance from the cluster center
              centroid to the outer boundary of the cluster,
              expressed in terms of Standard Deviations.  The
              outer limit of this value is defined by the
              user as the Maximum Cluster Radius value on the
              Clustering Execution Parameters screen.

CLUSTERING    Formally called the Performance Index, this
INDEX:        metric was renamed in this implementation to
              avoid confusion with similar terms in the z/OS
              Workload Manager.  It is the Root Mean Square
              of the distances of all cases (observations)
              within the cluster and serves as a simple
              measure of clustering effectiveness.  The lower
              this value, the tighter the fit of the cluster
              data.  This is not to say that outliers are not
              present; only that they are not distorting the
              cluster shape by their presence.

MAXIMUM       The maximum size of any cluster in the study
CLUSTER       under consideration, expressed in terms of
RADIUS:       standard deviations.  This value is specified
              by the analyst on the Execution Options for
              Neugents Technology panel (PERG900U) and is
              presented here for documenting the cluster
              definitions.

NORMAL        The count of observations within this cluster
OBS:          where all feature values were found to be
              "normal" in a statistical sense.  In this
              implementation, normal feature values are those
              that are less then the value determined by the
              Sample Trim Limit.  For example, if the Sample
              Trim Limit of 97.5%, then all feature values of
              "normal" clusters would reside below the 97.5
              percentile of the sample.

OUTLYING      The count of observations within this cluster
OBS:          where one or more feature values were found to
              be "outliers" in a statistical sense.  In this
              implementation, outlying feature values are
              those that are greater then the value
              determined by the Sample Trim Limit.  For
              example, if the Sample Trim Limit of 97.5%,
              then at least one feature value of "outlying"
              clusters would reside above the 97.5 percentile
              of the sample.

TOTAL OBS:    The sum of NORMAL and OUTLYING observations.


CLUSTER FEATURE RESOURCES:

For each feature (clustering element) that you define on the
Clustering Executions Options screen, a separate line will
be generated under the Cluster Feature Resources entry on the
report, and will contain the following elements:

FEATURE:      The name of the data element chosen for
              clustering.  This is normally a CA MICS data
              element but can be a computed (user defined)
              element if required.

DESCRIPTION:  The SAS label of the selected data element,
              from either the CA MICS GENLIB definition of
              supplied by the user.

MINIMUM       The minimum of all values for this feature
VALUE:        within this cluster.

AVERAGE       The average of all values for this feature
VALUE:        within this cluster.  This value approximates
              the centroid value for this feature and is also
              referred to as the feature mean.

STANDARD      The standard deviation of all values for this
DEVIATION:    feature within this cluster.

MAXIMUM       The maximum of all values for this feature
VALUE:        within this cluster.  This value approximates
              the outer boundary value for this feature.

TOTAL         The sum of all values for this feature within
VALUE:        this cluster.

% OF          The percentage of the sum of this feature's
TOTAL:        values for this cluster compared to the sum for
              the entire sample population.  For example, in
              the report above, the JOBTCBTM represented by
              cluster 1 is nearly 35% of the JOBTCBTM for the
              entire sample.

$SCOL
CLUSTER REPORT RESOURCES:

For each feature (reporting element) that you define on the
Clustering Executions Options screen, a separate line will
be generated under the Cluster Report Resources entry on the
report, and will contain the following elements:

FEATURE:      The name of the data element chosen for
              clustering.  This is normally a CA MICS data
              element but can be a computed (user defined)
              element if required.

DESCRIPTION:  The SAS label of the selected data element,
              from either the CA MICS GENLIB definition of
              supplied by the user.

TOTAL         The sum of all values for this feature within
VALUE:        this cluster.  This value represents the total
              of this reporting element for this cluster.

% OF          The percentage of the sum of this feature's
TOTAL:        values for this cluster compared to the sum for
              the entire sample population.  For example, in
              the report above, the JOBNLR represented by
              cluster 1 is 0% of the JOBNLR for the entire
              sample.

If the INCLUDE SPARSE CLUSTERS option on the clustering
Execution Options screen has been set to "Yes", a page of the
report will contain a listing of all excluded clusters and
their populations.

The following fields are presented in this section:

CLUSTER       Cluster numbers are assigned sequentially to
NUMBER:       the patterns that are identified by the
              algorithm.  Note that the order in which the
              clusters are identified is not an indicator of
              merit.

POPULATION:   The population of each excluded cluster.