Previous Topic: 11.2.2 Basic HypothesesNext Topic: 11.2.4 Data Selection and Outliers


11.2.3 Application of Workload Characterization


Perhaps one of the most common workload characterization
studies is the determination of job classes.  Typically, the
analyst guesses at a solution and then evaluates how well it
fits the system's workload.  This is known as the ad hoc
approach.  For example, consider a hypothetical system where
the following job class structure has been proposed.


         JOB CLASS    CPU MINS    PRINT LINES
         =========    ========    ===========
             A            2           5,000

             B            5          20,000

             C          unlim        unlim


If 60% of the jobs are assigned to class A, 25% to class B,
and 15% to class C, the analyst might assume that the classes
provide a good representation of the workload.

Applying the representative hypothesis test could reveal
problems.  For example, if 30% of the jobs use less than 5
CPU seconds and print less than 2,000 lines, then they would
be poorly represented by the class A limits.  Moreover, their
service characteristics could be much poorer when they are
randomly mixed with the larger jobs in class A than they
could be if they are assigned to a class by themselves.

Another problem that can result from arbitrarily established
job classes is the specification of a job class limit that
bisects a natural structure in the data.  For example, if 15%
of the jobs normally require between 110 and 130 CPU seconds,
imposing a two-minute limit could force the users of jobs
into class B to ensure that they are not canceled at the two-
minute limit.  This problem can also be identified by the
representative hypothesis test since the approximate two CPU
minute jobs would be poorly represented by the class B
limits.

The characteristics of an installation's job mix are
continually evolving.  As a result, it is probably
unreasonable to apply the stationary hypothesis test to a job
class structure.  However, you should examine the job class
structure on a quarterly or semiannual basis.

Both the equivalent job and ad hoc batch approaches to
workload characterization that we have discussed show the
problems that result from using mean values of arbitrarily
selected limits.  To meet the requirements of the first
hypothesis, we must exploit the natural structure of the
workload.  Statistical pattern recognition techniques
(clustering) offer a powerful tool for determining whether a
natural structure exists in an installation's workload data.