3.3.6.1.3.5 Analysis of the Clustering Output

3. PERFORMANCE ANALYSIS TOOLS › 3.3 Data Clustering Analysis › 3.3.6 Case Studies › 3.3.6.1 Batch Job Class Study › 3.3.6.1.3 Report Interpretation › 3.3.6.1.3.5 Analysis of the Clustering Output

3.3.6.1.3.5 Analysis of the Clustering Output

 
The final step of the job class identification process
involves some decisions based on the cluster descriptions and
the analyst's experience.  Although the clusters developed in
the example provide a reasonable representation of the
system's workload, additional work is needed to define an
appropriate job class structure.
 
After reviewing the Cluster Performance and Population
Summary reports, a simple analysis involving selecting,
sorting and printing was executed against the output
statistical data from the clustering operation.  The output
of the analysis is presented next.

Listing of Selected Data Clustering Cases 1 Presented by Clustering Feature Values Minimum Mean Maximum Minimum Mean Maximum Total Cluster TAPEUSE TAPEUSE TAPEUSE JOBTCBTM JOBTCBTM JOBTCBTM Cluster Cluster Total Normal Outlier Total Type Value Value Value Value Value Value Number Radius Observations Observations Observations Normal 0 0.00000 0 0:00:00.01 0:00:03.13 0:00:41.26 9 .190520 1,840 0 1,840 0 0.00000 0 0:00:46.15 0:01:23.66 0:02:09.66 11 .436100 31 0 31 0 0.00000 0 0:03:04.29 0:04:40.17 0:05:32.62 18 .993240 7 0 7 1 1.00000 1 0:00:00.03 0:00:19.22 0:01:23.85 14 .017900 46 0 46 1 1.00000 1 0:01:45.20 0:02:40.05 0:03:00.43 13 .712520 12 0 12 1 1.00000 1 0:03:31.30 0:03:38.57 0:03:50.19 5 .362838 5 0 5 1 1.00000 1 0:04:15.47 0:04:49.99 0:05:28.36 10 .197910 6 0 6 ------- ------------ ------------ ------------ Normal 1,947 0 1,947 ____________________________________________________________________________________________________________________________ Listing of Selected Data Clustering Cases 2 Presented by Clustering Feature Values Minimum Mean Maximum Minimum Mean Maximum Total Cluster TAPEUSE TAPEUSE TAPEUSE JOBTCBTM JOBTCBTM JOBTCBTM Cluster Cluster Total Normal Outlier Total Type Value Value Value Value Value Value Number Radius Observations Observations Observations Other 0 0.00000 0 0:05:40.33 0:06:25.95 0:07:12.03 12 .438660 1 6 7 0 0.00000 0 0:11:27.99 0:11:49.98 0:12:11.98 7 .686676 0 2 2 0 0.00000 0 0:17:50.60 0:17:56.97 0:18:03.35 4 .199029 0 2 2 0 0.00000 0 0:20:52.34 0:20:52.49 0:20:52.65 2 .004837 0 2 2 0 0.00000 0 0:25:24.12 0:25:29.51 0:25:34.91 3 .168434 0 2 2 1 1.00000 1 0:07:55.69 0:08:16.60 0:08:37.51 6 .652809 0 2 2 1 1.00000 1 0:13:22.97 0:14:10.76 0:15:29.06 15 .444520 0 3 3 1 1.00000 1 0:16:30.64 0:17:04.72 0:17:34.86 8 .063880 0 6 6 1 1.00000 1 0:19:06.31 0:20:10.82 0:21:42.96 16 .876600 0 7 7 1 1.00000 1 0:22:59.33 0:24:32.67 0:25:27.51 17 .914180 0 6 6 0 0.21429 2 0:02:44.98 1:43:52.37 5:01:29.65 1 .000000 1 13 14 ------- ------------ ------------ ------------ Other 2 51 53

 Figure 3-62.  Cluster Feature Summary Statistics
The code used for this analysis divided clusters into two
groups: one containing most of the normal data and the other
representing the outliers.  This division was done mainly to
demonstrate that the outliers will normally be considered as
exceptions to the job classification process.  Usually only
the regular (non-sparse) clusters will be processed and the
sparse clusters and outliers will be treated as exceptions.
In our case study, three of the sparse clusters, clusters 5,
10, and 18 were included in the "normal" clusters because
they contained all normal observations.  One regular cluster,
cluster 1 was placed in the "Other" category because it
contained almost all outliers.  These decisions require
installation knowledge and careful planning before
implementation.
 
A SAS program was written to evaluate the proposed classes.
The program utilizes the clustering criteria from the above
report and attempts to assign each job (case) to the proposed
classes (tested in ascending order of size) until a match is
found.  The results from the execution of this program are
shown in Figure 3-63.
 
DATA JOBCLASS;
SET TOTAL_SAMPLE_CLUSTERS;
 
TITLE;
FOOTNOTE;
 
SELECT (TAPEUSE);
  WHEN (0) DO;
    IF JOBTCBTM LE 40 THEN CLASS = "A";
    ELSE IF JOBTCBTM LE 300 THEN CLASS = "B";
      ELSE CLASS = "X";
    OUTPUT;
  END;
 
  WHEN (1) DO;
    IF JOBTCBTM LE 300 THEN CLASS = "C";
      ELSE CLASS = "X";
    OUTPUT;
  END;
 
  OTHERWISE DO;
    CLASS = "X";
    OUTPUT;
  END;
 
END;
RUN;
 
PROC FREQ DATA=JOBCLASS;
  TABLES CLASS;
    FORMAT CLASS $CLASS.;
TITLE1 "Analysis of Proposed Job Classes";
RUN;
The output from the job class structure evaluation is
presented next.



                          Analysis of Proposed Job Classes
 
                                 The FREQ Procedure
 
                                Proposed Job Class
 
                                                            Cumulative    Cumulative
Proposed Job Class                  Frequency     Percent     Frequency     Percent
 -----------------------------------------------------------------------------------
 :  0 Tapes; LE  40 secs. TCB Time      1837       91.85          1837        91.85
 :  0 Tapes; LE 300 secs. TCB Time        39        1.95          1876        93.80
 :1-3 Tapes; LE 300 secs. TCB Time        66        3.30          1942        97.10
 :<======  All Other Jobs  ======>        58        2.90          2000       100.00
 
 
 Figure 3-63.  Analysis of Proposed Job Classes
 
The code above can be changed to suite the installation
service and resource requirements.  The proposed job class
structure does a reasonably good job representing the
resource requirements for the jobs being studied.  The
exception class ("X") contains only 58 jobs and represents
less than 3.0 percent of the total number of jobs being
analyzed.  Changing the classification criteria in the code
above might place more jobs in the exception class.  This is
not necessarily bad, but keep in mind that in most
installations such jobs often require manual scheduling or
similar support.  Obviously the criteria used was selected
mainly for illustrative purposes and might not be realistic
given an installation's requirements.  You can use the
reports and substitute a variety of criteria, execute the
simulation code and evaluate the results.