3.3.5.2.3 Clustering Execution Options Screen

3. PERFORMANCE ANALYSIS TOOLS › 3.3 Data Clustering Analysis › 3.3.5 Component Operation › 3.3.5.2 Extended Options Menu › 3.3.5.2.3 Clustering Execution Options Screen

3.3.5.2.3 Clustering Execution Options Screen


Figure 3-47 illustrates the Clustering Execution Options
screen.

/----------- Execution Options for Neugents Technology ---------- ROW 1 OF 5    \
|                                                                               |
|Inquiry Step:  Data Clustering                                                 |
|                                                                               |
|        Clustering Input Data  ===> SAMPLE     (SAMPLE/POPULATION)             |
|            Clustering Method  ===> RADIUS     (RADIUS/CLUSTERS)               |
|       Maximum Cluster Radius  ===> 3.0        (1.0-3.5) If RADIUS Specified   |
|    Target Number of Clusters  ===>            (03-64)   If CLUSTER Specified  |
|                                                                               |
|         Training Sample Size  ===> 2000       (nnnnnn/ALL)                    |
|            Sample Trim Limit  ===> 02.5       (0.00-99.9)  Default: 2.5 Pct.  |
|                                                                               |
| Include Outliers in Clusters  ===> YES        (YES/NO)  Trim 02.5 Pct.        |
|      Include Sparse Clusters  ===> YES        (YES/NO)  LT 0.5 Pct. of Obs.   |
|                                                                               |
|    Report Cluster Population  ===> NO         (YES/NO)                        |
|  Report Using Account Fields  ===> YES        (YES/NO)                        |
|   Report Outliers Separately  ===> NO         (YES/NO)                        |
|       Report Sparse Clusters  ===> NO         (YES/NO)                        |
|   Create Cluster Index Graph  ===> NO         (YES/NO)                        |
|      Create Population Graph  ===> NO         (YES/NO)                        |
|                                                                               |
|                                                                               |
\-------------------------------------------------------------------------------/



 Figure 3-47.  Clustering Executions Options Screen

This screen is used to enter the execution options for the
clustering process and contains the following fields:

CA MICS file: The CA MICS file that contains the data
              elements that you wish to include in your data
              clustering analysis. If you are uncertain of
              the correct file name, type a question mark (?)
              in this field to obtain a complete selection
              list for your installation.

Clustering    Either SAMPLE or POPULATION, default is SAMPLE.
Input Data:   This parameter informs the application to
              select either a SAMPLE of the input data, or to
              use the entire POPULATION for processing.



Clustering    Either RADIUS or CLUSTER, default value is
Method:       RADIUS.  The clustering operation will be
              controlled by this option.  If RADIUS is
              specified, then clustering will be limited to a
              certain cluster size or RADIUS.  If CLUSTER is
              chosen, then clustering will produce only a
              certain number of clusters, but the clusters
              will vary more in size and can be quite large
              in scope.

Maximum       A range from 1.0 to 3.5, with a default value
              of 3.0.  The maximum size of the largest
              cluster to be formed, specified in terms of
              standard deviations.  Clustering will stop once
              any cluster exceeds this value.  Use of this
              option will cause more clusters to be formed
              but will generally result in a better 'fit' of
              the data.

Target        If CLUSTER is specified for the Clustering
Number of     Method, then a number from 3 to 64 must be
Clusters:     entered here, with a default value being 8.
              This is the number of clusters that will be
              produced in a particular execution.  This
              option is useful when using clustering as a
              data analysis tool where you might want to
              simply partition the data into heterogeneous
              groups, but are not concerned about the size or
              exact content of the clusters.

Training      A numeric value up to 999,999 or the word
Sample        'ALL', if ALL data is to chosen for the sample.
Size:         The default value is 2,000 observations.  The
              actual sample size is computed to be 2% of the
              sample or 2,000 cases, whichever is greater.

Sample Trim   A percentage of the sample to be removed, or
Limit:        'trimmed' to improve the normality of the data
              and allow for better descriptive statistics to
              be obtained.  The default value is 2.5% and
              will cause those observations that contain the
              top 2.5% of values for each cluster feature
              to be removed.  This should effectively remove
              observations that contain very large values
              that tend to distort descriptive statistics,
              and should improve the shape and fit of the
              data within the clusters.

Include       YES/NO, with a default value of YES.  This
Outliers in   option indicates whether observations that are
Clusters:     deemed to be outliers, that is they contain one
              or more feature values that exceed the trim
              limit.  Including these observations improves
              the quality of the study by ensuring that all
              sampled data will be accounted.  Outliers may
              be excluded if they have been reviewed and a
              determination is made that they are not
              relevant.

Include       YES/NO, with a default value of NO.  This
Sparse        option controls whether 'sparse' clusters are
Clusters:     included for subsequent reporting or other
              post-processing operations.  Sparse clusters
              are defined as those clusters that contain less
              than 0.5% of the sample population.  Note, if
              you select the default value of NO, sparse
              clusters are still created and the data remains
              available, but the data is merely excluded from
              various reports and graphs.

Report        YES/NO, with the default being NO.  This option
Cluster       lets you decide if you want to generate
Population:   this report. This report contains detailed
              usage statistics for each included cluster,
              with both cluster features and optional
              reporting elements being reported.

Report Using  YES/NO, with the default value being YES.
Account       Controls whether user defined CA MICS account
Fields:       fields are included in detail reports.
              Inclusion of these fields improves the content
              and usability of detailed reports but can
              result in a print line overflow and distortion
              of the report format.

Report        YES/NO, with the default value being NO.
Outliers      Controls whether to generate a detail report of
Separately:   all observations that contain one or more
              outlier cluster feature values.  This report
              can be used to review and investigate outliers
              and their possible effect on the clustering
              operation.

Report        YES/NO, with the default value being NO.
Sparse        Controls whether to generate a detail report of
Clusters:     all 'sparse' clusters and their observations.
              This report can be used to better understand
              the sparse clusters and to determine whether
              their content should be considered for
              additional processing.

Create        YES/NO, with the default value being NO.
Cluster       Controls whether to generate a printer graph of
Index         the cluster index values.  This graph provides
Graph:        a high-level pictorial overview of cluster
              performance for the study.

Create        YES/NO, with the default value being NO.
Population    Controls whether to generate a printer graph of
Graph:        the cluster population.  This graph provides a
              high-level pictorial overview of cluster
              population for the study.