15.5.9 Clustering Execution Options Screen

15. WORKLOAD CHARACTERIZATION › 15.5 Component Operation › 15.5.9 Clustering Execution Options Screen

15.5.9 Clustering Execution Options Screen


Figure 15-15A illustrates the Clustering Execution Options
screen.


/----------  Execution Options for Neugents Technology  ------------ ROW 1 OF 5 \
|                                                                               |
|Inquiry Step:  Workload Characterization Using Neugents Technology             |
|                                                                               |
|        Clustering Input Data  ===> SAMPLE     (SAMPLE/POPULATION)             |
|            Clustering Method  ===> RADIUS     (RADIUS/NUMBER OF CLUSTERS)     |
|       Maximum Cluster Radius  ===> 3.0        (1.0-3.5) If RADIUS Specified   |
|    Target Number of Clusters  ===>            (03-64)   If CLUSTER Specified  |
|                                                                               |
|         Training Sample Size  ===> 2000       (nnnnnn/ALL)                    |
|            Sample Trim Limit  ===> 02.5       (0.00-99.9)  Default: 2.5 Pct.  |
|                                                                               |
| Include Outliers in Clusters  ===> YES        (YES/NO)  Trim 02.5 Pct.        |
|      Include Sparse Clusters  ===> YES        (YES/NO)  LT 0.5 Pct. of Obs.   |
|                                                                               |
|    Report Cluster Population  ===> NO         (YES/NO)                        |
|  Report Using Account Fields  ===> YES        (YES/NO)                        |
|   Report Outliers Separately  ===> NO         (YES/NO)                        |
|       Report Sparse Clusters  ===> NO         (YES/NO)                        |
|   Create Cluster Index Graph  ===> NO         (YES/NO)                        |
|      Create Population Graph  ===> NO         (YES/NO)                        |
|                                                                               |
|                                                                               |
\------------------------------------------------------------------------------/

 Figure 15-15A.  Clustering Execution Options Screen


This screen is used to enter the execution options for the
clustering process and contains the following fields:

CA MICS       The CA MICS file that contains the data
file:         elements that you wish to include in your
              workload characterization analysis. If you are
              uncertain of the correct file name, type a
              question mark (?) in this field to obtain a
              complete selection list for your installation.

Clustering    Either SAMPLE or POPULATION, default is SAMPLE.
Input Data:   This parameter informs the application to
              select either a SAMPLE of the input data, or to
              use the entire POPULATION for processing.

Clustering    Either RADIUS or CLUSTER, default value is
Method:       RADIUS.  The clustering operation will be
              controlled by this option.  If RADIUS is
              specified, then clustering will be limited to a
              certain cluster size or RADIUS.  If CLUSTER is
              chosen, then clustering will produce only a
              certain number of clusters, but the clusters
              will vary more in size and can be quite large
              in scope.

Maximum       A range from 1.0 to 3.5, with a default value
Cluster       of 3.0.  The Maximum size of the largest
Radius:       cluster to be formed, is specified in terms of
              standard deviations.  Clustering will stop once
              any cluster exceeds this value.  Use of this
              option will cause more clusters to be formed
              but will generally result in a better 'fit' of
              the data.

Target        If CLUSTER is specified for the Clustering
Number of     Method, then a number from 3 to 64 must be
Clusters:     entered here, with the default value being 8.
              This is the number of clusters that will be
              produced in a particular execution.  This
              options is useful when using clustering as a
              data analysis tool where you might want to
              simply partition the data into heterogeneous
              groups, but are not concerned about the size or
              exact content of the clusters.

Training      A numeric value up to 999,999 or the word
Sample        'ALL', if ALL data is chosen for the sample.
Size:         The default value is 2,000 observations.  The
              actual sample size is computed to be 2% of the
              sample or 2,000 cases, whichever is greater.

Sample Trim   A percentage of the sample to be removed, or
Limit:        'trimmed' to improve the normality of the data
              and allow for better descriptive statistics to
              be obtained.  The default value is 2.5% and
              will cause those observations that contain the
              top 2.5% of values for each cluster feature to
              be removed.  This should effectively remove
              observations that contain very large values
              which tend to distort descriptive statistics
              and should improve the shape and fit of the
              data within the clusters.

Include       YES/NO, with a default value of YES.  This
Outliers in   option indicates if observations are considered
Clusters:     outliers, when they contain one or more feature
              values that exceed the trim limit.  Including
              these observations improves the quality of the
              study by ensuring that all sampled data will be
              accounted.  Outliers may be excluded if they
              have been reviewed and a determination is made
              that they are not relevant.

Include       YES/NO, with a default value of NO.  This
Sparse        option controls whether sparse clusters are
Clusters:     included for subsequent reporting or other
              post-processing operations.  Sparse clusters
              are defined as those clusters that contain less
              than 0.5% of the sample population.  Note, if
              you select the Default value of NO, sparse
              clusters are still created and the data remains
              available, but is merely excluded from various
              reports and graphs.

Report        YES/NO, with the default being NO.  This option
Cluster       lets you decide if you want to generate this
Population:   report.  The report contains detailed usage
              statistics for each included cluster, with both
              cluster features and optional reporting
              elements being reported.

Report Using  YES/NO, with the default value being YES.
Account       Controls whether user defined CA MICS account
Fields:       fields are included in detail reports.
              Inclusion of these fields improves the content
              and usability of detailed reports, but can
              result in a print line overflow and distortion
              of the report format.

Report        YES/NO, with the default value being NO.
Outliers      Controls whether to generate a detail report of
Separately:   all observations that contain one or more
              outlier cluster feature values.  This report
              can be used to review and investigate outliers
              and their possible effect on the clustering
              operation.

Report        YES/NO, with the default value being NO.
Sparse        Controls whether to generate a detail report of
Clusters:     all sparse clusters and their observations.
              This report can be used to better understand
              the sparse clusters and to determine whether
              their content should be considered for
              additional processing.

Create        YES/NO, with the default value being NO.
Cluster       Controls whether to generate a printer graph of
Index         the cluster index values.  This graph provides
Graph:        a high-level pictorial overview of cluster
              performance for the study.

Create        YES/NO, with the default value being NO.
Cluster       Controls whether to generate a printer
Population:   graph of the cluster population.  This graph
              provides a high-level pictorial overview of
              cluster population for the study.