

3. PERFORMANCE ANALYSIS TOOLS › 3.3 Data Clustering Analysis › 3.3.5 Component Operation › 3.3.5.2 Extended Options Menu › 3.3.5.2.3 Clustering Execution Options Screen
3.3.5.2.3 Clustering Execution Options Screen
Figure 3-47 illustrates the Clustering Execution Options
screen.
/----------- Execution Options for Neugents Technology ---------- ROW 1 OF 5 \
| |
|Inquiry Step: Data Clustering |
| |
| Clustering Input Data ===> SAMPLE (SAMPLE/POPULATION) |
| Clustering Method ===> RADIUS (RADIUS/CLUSTERS) |
| Maximum Cluster Radius ===> 3.0 (1.0-3.5) If RADIUS Specified |
| Target Number of Clusters ===> (03-64) If CLUSTER Specified |
| |
| Training Sample Size ===> 2000 (nnnnnn/ALL) |
| Sample Trim Limit ===> 02.5 (0.00-99.9) Default: 2.5 Pct. |
| |
| Include Outliers in Clusters ===> YES (YES/NO) Trim 02.5 Pct. |
| Include Sparse Clusters ===> YES (YES/NO) LT 0.5 Pct. of Obs. |
| |
| Report Cluster Population ===> NO (YES/NO) |
| Report Using Account Fields ===> YES (YES/NO) |
| Report Outliers Separately ===> NO (YES/NO) |
| Report Sparse Clusters ===> NO (YES/NO) |
| Create Cluster Index Graph ===> NO (YES/NO) |
| Create Population Graph ===> NO (YES/NO) |
| |
| |
\-------------------------------------------------------------------------------/
Figure 3-47. Clustering Executions Options Screen
This screen is used to enter the execution options for the
clustering process and contains the following fields:
CA MICS file: The CA MICS file that contains the data
elements that you wish to include in your data
clustering analysis. If you are uncertain of
the correct file name, type a question mark (?)
in this field to obtain a complete selection
list for your installation.
Clustering Either SAMPLE or POPULATION, default is SAMPLE.
Input Data: This parameter informs the application to
select either a SAMPLE of the input data, or to
use the entire POPULATION for processing.
Clustering Either RADIUS or CLUSTER, default value is
Method: RADIUS. The clustering operation will be
controlled by this option. If RADIUS is
specified, then clustering will be limited to a
certain cluster size or RADIUS. If CLUSTER is
chosen, then clustering will produce only a
certain number of clusters, but the clusters
will vary more in size and can be quite large
in scope.
Maximum A range from 1.0 to 3.5, with a default value
of 3.0. The maximum size of the largest
cluster to be formed, specified in terms of
standard deviations. Clustering will stop once
any cluster exceeds this value. Use of this
option will cause more clusters to be formed
but will generally result in a better 'fit' of
the data.
Target If CLUSTER is specified for the Clustering
Number of Method, then a number from 3 to 64 must be
Clusters: entered here, with a default value being 8.
This is the number of clusters that will be
produced in a particular execution. This
option is useful when using clustering as a
data analysis tool where you might want to
simply partition the data into heterogeneous
groups, but are not concerned about the size or
exact content of the clusters.
Training A numeric value up to 999,999 or the word
Sample 'ALL', if ALL data is to chosen for the sample.
Size: The default value is 2,000 observations. The
actual sample size is computed to be 2% of the
sample or 2,000 cases, whichever is greater.
Sample Trim A percentage of the sample to be removed, or
Limit: 'trimmed' to improve the normality of the data
and allow for better descriptive statistics to
be obtained. The default value is 2.5% and
will cause those observations that contain the
top 2.5% of values for each cluster feature
to be removed. This should effectively remove
observations that contain very large values
that tend to distort descriptive statistics,
and should improve the shape and fit of the
data within the clusters.
Include YES/NO, with a default value of YES. This
Outliers in option indicates whether observations that are
Clusters: deemed to be outliers, that is they contain one
or more feature values that exceed the trim
limit. Including these observations improves
the quality of the study by ensuring that all
sampled data will be accounted. Outliers may
be excluded if they have been reviewed and a
determination is made that they are not
relevant.
Include YES/NO, with a default value of NO. This
Sparse option controls whether 'sparse' clusters are
Clusters: included for subsequent reporting or other
post-processing operations. Sparse clusters
are defined as those clusters that contain less
than 0.5% of the sample population. Note, if
you select the default value of NO, sparse
clusters are still created and the data remains
available, but the data is merely excluded from
various reports and graphs.
Report YES/NO, with the default being NO. This option
Cluster lets you decide if you want to generate
Population: this report. This report contains detailed
usage statistics for each included cluster,
with both cluster features and optional
reporting elements being reported.
Report Using YES/NO, with the default value being YES.
Account Controls whether user defined CA MICS account
Fields: fields are included in detail reports.
Inclusion of these fields improves the content
and usability of detailed reports but can
result in a print line overflow and distortion
of the report format.
Report YES/NO, with the default value being NO.
Outliers Controls whether to generate a detail report of
Separately: all observations that contain one or more
outlier cluster feature values. This report
can be used to review and investigate outliers
and their possible effect on the clustering
operation.
Report YES/NO, with the default value being NO.
Sparse Controls whether to generate a detail report of
Clusters: all 'sparse' clusters and their observations.
This report can be used to better understand
the sparse clusters and to determine whether
their content should be considered for
additional processing.
Create YES/NO, with the default value being NO.
Cluster Controls whether to generate a printer graph of
Index the cluster index values. This graph provides
Graph: a high-level pictorial overview of cluster
performance for the study.
Create YES/NO, with the default value being NO.
Population Controls whether to generate a printer graph of
Graph: the cluster population. This graph provides a
high-level pictorial overview of cluster
population for the study.
Copyright © 2014 CA.
All rights reserved.
 
|
|