Previous Topic: 5.5 Using SummarizationNext Topic: 5.5.2 Defining a File Summarization Process


5.5.1 Summarization Concepts


Different types of data elements are treated differently
during the summarization process.

Retained elements are uniquely defined to the file in which
they are maintained, and contain the last value processed.
Here is an example:

JOBEPRTY - JES Execution Priority

Accumulated elements are uniquely defined to the file in
which they are maintained, and contain the sum of all values
processed for a given file's level of data summarization.
Here is an example:

JOBCPUTM - Job CPU Time

Minimum and maximum elements are uniquely defined to the file
in which they are maintained, and contain the minimum or
maximum value processed for a given file's level of data
summarization.  Here is an example of each:

PGMMXWSS - Program Maximum Working Set Size
DSDMNRPG - Min Free Pages in Free Chain

Derived elements are uniquely defined to the file in which
they are maintained, and contain the results of special
computations (for example, paging rate per second) that are
computed for a given file's level of data summarization.
Here is an example:

PAGPSDPG - Demand Paging Per Second

Common data elements are elements that maintain a standard
definition, even though they may appear in more than one file
in the database, They may be any of the above data element
types and may be used for data sequencing, summarization, or
selection.  Here is an example:

SYSID - System Identification
Summarization of some of these depends on the definition of
the variable itself.  These definitions are not known to SAS.
Consequently, using SAS summarization PROCS on CA MICS files
can result in erroneous results.

A CA MICS summarization facility takes the definitions into
account to provide proper summarization.  Proper
summarization of CA MICS files, as well as the files you
create from CA MICS files, depends on the CA MICS
summarization facility.

A separate summarization facility is provided for each CA
MICS file.  In creating your own files you must do any
summarization on the CA MICS files before altering
observations by merging, by creating new variables in an
observation, or by eliminating any observations through the
use of selection (with subsetting IF statements).

How the Summarization Facility is Used
--------------------------------------

The following scenario illustrates the use of the CA MICS
summarization facility:

Suppose we were interested in the total CP processor dispatch
time for a complete sysplex per month.  Checking the file's
organization in the Files chapter in the Hardware and SCP
Analyzer Guide, we see that the CPU Activity File (HARCPU) is
sequenced by SYSID, YEAR, MONTH, and ZONE (in the MONTHS
timespan).  Assume that this central computing complex or
sysplex has 2 LPARs (2 SYSIDs) and we want to determine the
total CPU consumption on both systems combined.

We want to create a file that is in the sequence YEAR and
MONTH with only a single observation for the whole sysplex
for each YEAR and MONTH combination.  This means that we want
to summarize MONTHS.HARCPU01 on YEAR and MONTH.  To do this,
we must first sort the MONTHS.HARCPU01 file in the sequence
YEAR and MONTH.

We then use the CA MICS summarization facility to summarizes
the HARCPU file and recompute all unique variables.
The sample file contains the following variables:

Variable Name   Content                 Variable Type
-------------   -------                 -------------
SYSID           Alphanumeric 4 digits*  Common ID
YEAR            Numeric 78 - 99         Common DATE
MONTH           Numeric 01 - 12*        Common DATE
DAY             Numeric 01 - 31*        Common DATE
ZONE            Alphanumeric 1 digit    Common TIME
HOUR            Numeric 00 - 23*        Common TIME
ENDTS           SAS Time-Stamp          Common TIME
R1              numeric                 Retained
R2              character               Retained
A1              numeric                 Accumulated
M1              max value               MAX/MIN/AVG
M2              min value               MAX/MIN/AVG
M3              Average                 MAX/MIN/AVG
P1              Value< 1                Percentage
C1              numeric                 Computed
                (Computed from A1/M1)

* In this example, we use only a single digit.

The following questions are addressed in the context of the
sample file:

o What happens to the values of variables in the sequence
  list?

o What happens to values of common date and time variables
  not in the sort sequence?

o Under what circumstances do variables lose their meaning
  due to the summarization process?
The summary/sequence for this file is shown below:

+---------+-------------------------------------------------+
|Timespan | Level of Data Granularity                       |
+---------+-------------------------------------------------+
|         |                                                 |
| DETAIL  |SYSID     ACCTNO1   ACCTNO2   ACCTNO3   JOBGROUP |
|         |JOB       YEAR      MONTH     DAY       ENDTS    |
|         |                                                 |
| DAYS    |SYSID     ACCTNO1   ACCTNO2   ACCTNO3   JOBGROUP |
|         |JOB       YEAR      MONTH     DAY       HOUR     |
|         |                                                 |
| WEEKS   |N/A                                              |
|         |                                                 |
| MONTHS  |SYSID     ACCTNO1   ACCTNO2   ACCTNO3   JOBGROUP |
|         |YEAR      MONTH     ZONE                         |
|         |                                                 |
| YEARS   |SYSID     ACCTNO1   ACCTNO2   ACCTNO3   JOBGROUP |
|         |YEAR      ZONE                                   |
|         |                                                 |
+---------+-------------------------------------------------+


Since this sample file is the DETAIL timespan, the data is
not summarized.  All variables are meaningful.  The
observations in this file (for the DETAIL timespan) are as
follows:

+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
|  | S |  |M |  |  | E|  |   |  |     |    |   |   |   |    |
| O| Y |Y |O | D| H| N| Z|   |  |     |    |   |   |   |    |
| B| S |E |N | A| O| D| O|  R| R|  A  |  M | M | M | P | C  |
| S| I |A |T | Y| U| T| N|  1| 2|  1  |  1 | 2 | 3 | 1 | 1  |
| #| D |R |H |  | R| S| E|   |  |     |    |   |   |   |    |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79|1 | 1| 1| 1| 1| 12| A| 1000| 100| 12| 16| .6|10.0|
| 2| A |79|1 | 1| 1| 2| 1| 12| B| 2000| 100| 20| 37| .7|20.0|
| 3| A |79|1 | 1| 6| 5| 2| 13| C|   40|  80| 10| 70| .6| 0.5|
| 4| A |79|2 | 6| 9| 4| 2| 14| D|   60|  60| 10| 40| .7| 1.0|
| 5| A |79|2 | 7| 3| 1| 1| 15| E|   60|  40| 10| 20| .6| 1.5|
| 6| B |79|1 | 3| 2| 6| 1| 16| F|   90|  30| 10| 20| .7| 3.0|
| 7| B |79|2 | 4| 5| 2| 2| 17| G| 1000|  89| 12| 43| .6|11.2|
| 8| B |79|2 | 4| 5|41| 2| 18| H| 2000|  69| 12| 43| .7|29.0|
| 9| B |79|2 | 4| 6|57| 2| 19| I|  100|  99| 12| 43| .6| 1.1|
|10| B |79|2 | 4| 6|58| 2| 10| J|  100|  40| 13| 21| .7| 2.5|
|11| B |79|2 | 4| 6|59| 2| 11| J|  200|  50| 15| 25| .7| 4.0|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+

Figure 5-4.  DETAIL File Content
Summarizing in the DAYS Timespan
----------------------------------

Now consider how the observation variables will change when
going from DETAIL to the DAYS timespan.  The DAYS timespan is
summarized to the HOUR level, and is in the same sequence as
the DETAIL timespan (except for ENDTS).

The summarization rules are as follows:

o  Variables above the lowest level of summarization are
   unchanged.  For example, SYSID, YEAR, MONTH, and DAY are
   unchanged in the summarized observations when summarizing
   on the HOUR within them.

o  Retained Variables - The value for the last observation in
   the summary is kept.

o  Common Variables - The value for the last observation in
   the summary is kept.

o  Common Date and Time Variables - If the variable being
   summarized is higher in granularity than the one being
   summarized on (ZONE vs HOUR) or (MONTH vs DAY), then the
   value remains the same.  If the opposite is true, then the
   value is set to "missing" (except for ENDTS and STARTTS,
   which are treated differently, because after summarization
   they reflect the highest and lowest timestamp respectively
   of the observations summarized).  For example, when the
   observations are to ZONE within MONTH, as is the case in
   the MONTHS timespan, the variables DAY and HOUR are no
   longer meaningful and are changed to a value of missing
   (.).

o  Accumulated - The sum of the values in the observations.

o  MIN/MAX Variable - The min/max values from the
   observations being combined.

o  AVG - The sum of the values in the observations divided by
   the number of observations being combined.

o  Percent and/or Computed/Calculated - Recomputed from the
   variables in the observation resulting from the
   summarization operation.
Figure 5-5 shows which observations are combined from Figure
5-4 to make the observations in Figure 5-6.  That is, in
Figure 5-5, the Summarized Observation Numbers are the
observation numbers in Figure 5-6, while the Input
Observation Numbers are the observation numbers in Figure
5-4.


        +-------------+--------------------------+
        |  Summarized |                          |
        | Observation |                          |
        |   Number    | Input Observation Numbers|
        |-------------+--------------------------|
        |      1      |     1, 2                 |
        |      2      |     3                    |
        |      3      |     4                    |
        |      4      |     5                    |
        |      5      |     6                    |
        |      6      |     7, 8                 |
        |      7      |     9, 10, 11            |
        +-------------+--------------------------+

Figure 5-5.  Observations Summarized to DAYS Timespan


+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
|  | S |  | M|  |  | E|  |   |  |     |    |   |   |   |    |
| O| Y |Y | O| D| H| N| Z|   |  |     |    |   |   |   |    |
| B| S |E | N| A| O| D| O|  R| R|  A  |  M | M | M | P | C  |
| S| I |A | T| Y| U| T| N|  1| 2|  1  |  1 | 2 | 3 | 1 | 1  |
| #| D |R | H|  | R| S| E|   |  |     |    |   |   |   |    |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79| 1| 1| 1| 2| 1| 12| B| 3000| 100| 12| 27| .7|30.0|
| 2| A |79| 1| 1| 6| 5| 2| 13| C|   40|  80| 10| 70| .6| 0.5|
| 3| A |79| 2| 6| 9| 4| 2| 14| D|   60|  60| 10| 40| .7| 1.0|
| 4| A |79| 2| 7| 3| 1| 1| 15| E|   60|  40| 10| 20| .6| 1.5|
| 5| B |79| 1| 3| 2| 6| 1| 16| F|   90|  30| 10| 20| .7| 3.0|
| 6| B |79| 2| 4| 5|41| 2| 18| H| 3000|  89| 12| 43| .7|33.7|
| 7| B |79| 2| 4| 6|59| 2| 11| J|  400|  99| 12| 33| .7| 4.0|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+

Figure 5-6.  DAYS File Content
See how the DETAIL timespan has 11 observations but the DAYS
timespan has only 7.  Summarization has resulted in fewer
observations but the same number of variables:

o Observations 3, 4, 5, and 6 (from the DETAIL timespan) look
  exactly the same as in the DETAIL timespan. They were not
  combined with any other observations because they did not
  share the same values of the sequence variables in the
  DETAIL timespan with any other observations.

o Observations 1 and 2 were summarized because they shared
  the same values of the sequence variables. They were
  "combined" into a single (new) observation.

o The values of the other variables were determined according
  to the summarization rules:

  - Observations 7 and 8 were combined.
  - Observations 9, 10, and 11 were combined.


Summarizing in the MONTHS Timespan
----------------------------------

When summarizing in the MONTHS timespan, the sequence
variables are different: DAY has vanished and ZONE has
appeared. It is also important to know whether the DETAIL or
DAYS timespan is used to produce the MONTHS observations.  It
is DAYS. (In most cases summarization uses the previous
timespan.)

Figure 5-7 shows which observations are combined from Figure
5-6 to create observations in Figure 5-8.  Figure 5-8 shows
the resulting observations.  Note the following:

o The DAYS and HOURS variables are no longer meaningful since
  they are lower in granularity than ZONE within MONTH.

o The ENDTS variable assumes a new meaning and now represents
  the highest timestamp found in the observations summarized.
        +-------------+--------------------------+
        |  Summarized |                          |
        | Observation |                          |
        |   Number    | Input Observation Numbers|
        |-------------+--------------------------|
        |      1      |       1                  |
        |      2      |       2                  |
        |      3      |       3                  |
        |      4      |       4                  |
        |      5      |       5                  |
        |      6      |       6, 7               |
        +-------------+--------------------------+

Figure 5-7.  Observations Summarized to MONTHS Timespan


+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
|  | S |  |M |  |  | E|  |   |  |     |    |   |   |   |    |
| O| Y |Y |O | D| H| N| Z|   |  |     |    |   |   |   |    |
| B| S |E |N | A| O| D| O|  R| R|  A  |  M | M | M | P | C  |
| S| I |A |T | Y| U| T| N|  1| 2|  1  |  1 | 2 | 3 | 1 | 1  |
| #| D |R |H |  | R| S| E|   |  |     |    |   |   |   |    |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79| 1| .| .| 2| 1| 12| B| 3000| 100| 12| 27| .7|30.0|
| 2| A |79| 1| .| .| 5| 2| 13| C|   40|  80| 10| 70| .6| 0.5|
| 3| A |79| 2| .| .| 4| 2| 14| D|   60|  60| 10| 40| .7| 1.0|
| 4| A |79| 2| .| .| 1| 1| 15| E|   60|  40| 10| 20| .6| 1.5|
| 5| B |79| 1| .| .| 6| 1| 16| F|   90|  30| 10| 20| .7| 3.0|
| 6| B |79| 2| .| .|59| 2| 11| J| 3400|  99| 12| 38| .7|34.3|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+

Figure 5-8.  MONTHS File Content