

15. WORKLOAD CHARACTERIZATION › 15.5 Component Operation › 15.5.9 Clustering Execution Options Screen
15.5.9 Clustering Execution Options Screen
Figure 15-15A illustrates the Clustering Execution Options
screen.
/---------- Execution Options for Neugents Technology ------------ ROW 1 OF 5 \
| |
|Inquiry Step: Workload Characterization Using Neugents Technology |
| |
| Clustering Input Data ===> SAMPLE (SAMPLE/POPULATION) |
| Clustering Method ===> RADIUS (RADIUS/NUMBER OF CLUSTERS) |
| Maximum Cluster Radius ===> 3.0 (1.0-3.5) If RADIUS Specified |
| Target Number of Clusters ===> (03-64) If CLUSTER Specified |
| |
| Training Sample Size ===> 2000 (nnnnnn/ALL) |
| Sample Trim Limit ===> 02.5 (0.00-99.9) Default: 2.5 Pct. |
| |
| Include Outliers in Clusters ===> YES (YES/NO) Trim 02.5 Pct. |
| Include Sparse Clusters ===> YES (YES/NO) LT 0.5 Pct. of Obs. |
| |
| Report Cluster Population ===> NO (YES/NO) |
| Report Using Account Fields ===> YES (YES/NO) |
| Report Outliers Separately ===> NO (YES/NO) |
| Report Sparse Clusters ===> NO (YES/NO) |
| Create Cluster Index Graph ===> NO (YES/NO) |
| Create Population Graph ===> NO (YES/NO) |
| |
| |
\------------------------------------------------------------------------------/
Figure 15-15A. Clustering Execution Options Screen
This screen is used to enter the execution options for the
clustering process and contains the following fields:
CA MICS The CA MICS file that contains the data
file: elements that you wish to include in your
workload characterization analysis. If you are
uncertain of the correct file name, type a
question mark (?) in this field to obtain a
complete selection list for your installation.
Clustering Either SAMPLE or POPULATION, default is SAMPLE.
Input Data: This parameter informs the application to
select either a SAMPLE of the input data, or to
use the entire POPULATION for processing.
Clustering Either RADIUS or CLUSTER, default value is
Method: RADIUS. The clustering operation will be
controlled by this option. If RADIUS is
specified, then clustering will be limited to a
certain cluster size or RADIUS. If CLUSTER is
chosen, then clustering will produce only a
certain number of clusters, but the clusters
will vary more in size and can be quite large
in scope.
Maximum A range from 1.0 to 3.5, with a default value
Cluster of 3.0. The Maximum size of the largest
Radius: cluster to be formed, is specified in terms of
standard deviations. Clustering will stop once
any cluster exceeds this value. Use of this
option will cause more clusters to be formed
but will generally result in a better 'fit' of
the data.
Target If CLUSTER is specified for the Clustering
Number of Method, then a number from 3 to 64 must be
Clusters: entered here, with the default value being 8.
This is the number of clusters that will be
produced in a particular execution. This
options is useful when using clustering as a
data analysis tool where you might want to
simply partition the data into heterogeneous
groups, but are not concerned about the size or
exact content of the clusters.
Training A numeric value up to 999,999 or the word
Sample 'ALL', if ALL data is chosen for the sample.
Size: The default value is 2,000 observations. The
actual sample size is computed to be 2% of the
sample or 2,000 cases, whichever is greater.
Sample Trim A percentage of the sample to be removed, or
Limit: 'trimmed' to improve the normality of the data
and allow for better descriptive statistics to
be obtained. The default value is 2.5% and
will cause those observations that contain the
top 2.5% of values for each cluster feature to
be removed. This should effectively remove
observations that contain very large values
which tend to distort descriptive statistics
and should improve the shape and fit of the
data within the clusters.
Include YES/NO, with a default value of YES. This
Outliers in option indicates if observations are considered
Clusters: outliers, when they contain one or more feature
values that exceed the trim limit. Including
these observations improves the quality of the
study by ensuring that all sampled data will be
accounted. Outliers may be excluded if they
have been reviewed and a determination is made
that they are not relevant.
Include YES/NO, with a default value of NO. This
Sparse option controls whether sparse clusters are
Clusters: included for subsequent reporting or other
post-processing operations. Sparse clusters
are defined as those clusters that contain less
than 0.5% of the sample population. Note, if
you select the Default value of NO, sparse
clusters are still created and the data remains
available, but is merely excluded from various
reports and graphs.
Report YES/NO, with the default being NO. This option
Cluster lets you decide if you want to generate this
Population: report. The report contains detailed usage
statistics for each included cluster, with both
cluster features and optional reporting
elements being reported.
Report Using YES/NO, with the default value being YES.
Account Controls whether user defined CA MICS account
Fields: fields are included in detail reports.
Inclusion of these fields improves the content
and usability of detailed reports, but can
result in a print line overflow and distortion
of the report format.
Report YES/NO, with the default value being NO.
Outliers Controls whether to generate a detail report of
Separately: all observations that contain one or more
outlier cluster feature values. This report
can be used to review and investigate outliers
and their possible effect on the clustering
operation.
Report YES/NO, with the default value being NO.
Sparse Controls whether to generate a detail report of
Clusters: all sparse clusters and their observations.
This report can be used to better understand
the sparse clusters and to determine whether
their content should be considered for
additional processing.
Create YES/NO, with the default value being NO.
Cluster Controls whether to generate a printer graph of
Index the cluster index values. This graph provides
Graph: a high-level pictorial overview of cluster
performance for the study.
Create YES/NO, with the default value being NO.
Cluster Controls whether to generate a printer
Population: graph of the cluster population. This graph
provides a high-level pictorial overview of
cluster population for the study.
Copyright © 2014 CA.
All rights reserved.
 
|
|