2.2.4.2.2 MVS Working Set Determination

2. PERFORMANCE REPORTING ANALYSIS › 2.2 Real Storage Management Analysis › 2.2.4 MVS Concepts › 2.2.4.2 MVS Virtual Storage Overview › 2.2.4.2.2 MVS Working Set Determination
2.2.4.2.2 MVS Working Set Determination


MVS/SE2 introduced a facility to guarantee a performance
group a fixed amount of real memory.  To use this facility,
storage isolation (fencing) it is desirable to be able to
characterize the working set of a program that is to be
fenced.

MVS maintains a measure of a program's working set, but to be
able to use fencing fully, you would like to be able to
estimate the paging that would be expected with various
amounts of real storage available to the program.

With this information, you can use fencing to minimize the
page faulting of a program, or you can allocate the amount of
real storage expected to yield a desired paging rate.  The
Real Storage Analysis portion of the CA MICS Performance
Manager Option provides reports and plots that support the
working set characterization method described in this
section.

You could characterize the working set requirements of a
program by repeatedly executing the program, limiting to it
different amounts of real storage in fixed increments.
Starting at 50 pages and adding 50 pages with each execution,
you should expect (in theory) to see a curve something like
that shown in Figure 2-29.

Note that this procedure gives an estimate of the theoretical
working set of the program.  In applying storage isolation,
an estimate of the theoretical working set is what is
desired.  When real storage is in plentiful supply, programs
that are storage isolated are allowed to accumulate more
pages, just as the other address spaces are.  In times of
increased storage demand, however, it serves no purpose to
allow the storage- isolated tasks to retain more storage than
they require, as they will, in most cases, reach the same
paging rates after some period of time.

             |
             | *
  Average    |
  Page    30 +
  Faults     |  *
  per        |
  CPU        |   *
  Second  25 +
             |    *
             |
             |     *
          20 +
             |      *
             |
             |       *
          15 +        *
             |
             |         *
             |
          10 +           *
             |            *
             |              *
             |                *
           5 +                  *
             |                       *
             |                           *
             |                                *
             ----+----+----+----+----+----+----.
                50   100  150  200  250  300

                      Working Set (Frames)


    Figure 2-29.  A Working Set Characterization


Arthur Petrella and Harold Farrey proposed a functional
definition of the working set in MVS that has worked well in
practice.  The working set determination method used here is
based on their ideas.

The working set is, at any point in time, the number of pages
referenced during the last interval (t).  Since the real
memory available to the program differs over the course of
execution, as well as from run to run, a function describing
the working set must represent the dynamics of the available
real storage as well as the characteristics of the program.
Thus, a probabilistic representation is required.  Petrella
and Farrey proposed a binomial probability function.

A set of events may be represented by a binomial distribution
if three conditions are met:

  o  There are only two things that can happen (success or
     failure).

  o  The probability of success (p) is constant from trial to
     trial.

  o  There are n independent trials.  That is, the outcome of
     one trial does not depend on the outcome of any  of  the
     others.

If you define a trial as the occurrence or non-occurrence of
a page fault, then conditions 1 and 3 are certainly met.  And
if a measurement interval of length t is chosen such that
there can be at most one page fault in each interval, then
condition 2 is also met.


Now, given that the probability of success (a page fault) is
small, if you consider the probability of N page faults over
a time interval T, with T much larger than t, the binomial
distribution can now be approximated by the Poisson
distribution shown below.  For a more complete discussion of
the Poisson process, refer to page 117 of Probability,
Statistics, and Queueing Theory by Arnold O. Allen.

                                 _    _
                                |   N  |
                             -k |  k   |
                    P (T) = e   | ---- |
                                |  N!  |
                                |_    _|


So, we expect that Figure 2-29 is an exponential relation.
With this relation, we can estimate the results of the
fencing experiment from the standard SMF data collected
during regular production runs of the program.  In the
fencing experiment, over any single run what we are really
measuring is the expected value of this function.  CA MICS
provides three measures that can be used to plot the relation
in Figure 2-29:

  o  PGMPGSEC - in  the batch  program file is the product of
     the number of pages used by the program and CPU time.

  o  PGMPGIN - is the number of page-ins during the program's
     execution.

  o  PGMPRCLM - is  the number of page reclaims during the
     program's execution.

From these variables, CA MICS computes average working set
size in pages (PGMAVWSS), and the Real Storage Analysis
component calculates the paging rate as PGMPGIN + PGMPRCLM.
Thus, by retrieving and analyzing several executions of a
program, the relationship suggested by Figure 2-29 can be
reproduced.

Because we are dealing with a relatively small number of
samples, it is important to be able to test the hypothesis,
"Does the data have an exponential relation?"  and also to be
able to project an optimal amount of memory to assign to the
program.  An exponential function may be transformed into a
linear one by:

                      (mx+b)
                  ln e       = mx+b

We add one to the CPU page fault rate to allow for the case
where there is no paging.  Now, the relation shown below is
linear.

    ln(1+page faults/CPU second) vs. working set size

If these values are plotted for a set of executions of a
program, the linear relation shown above can be estimated by
regression techniques.  The X intercept of this plot
approximates the virtual memory used by the program.  An even
more interesting result is that the slope of the line
indicates whether a program that has already been fenced has
more real storage than is required for its working set via
the storage isolation definitions.

The final step in the process is to investigate the quality
of the linear model.  Does the data indeed have a linear
relation?  How well does the regression model describe the
variation in the data?  To do this, the SAS procedure REG can
be used.