

Common Techniques Used In Reporting › 5.5 Using Summarization › 5.5.1 Summarization Concepts
5.5.1 Summarization Concepts
Different types of data elements are treated differently
during the summarization process.
Retained elements are uniquely defined to the file in which
they are maintained, and contain the last value processed.
Here is an example:
JOBEPRTY - JES Execution Priority
Accumulated elements are uniquely defined to the file in
which they are maintained, and contain the sum of all values
processed for a given file's level of data summarization.
Here is an example:
JOBCPUTM - Job CPU Time
Minimum and maximum elements are uniquely defined to the file
in which they are maintained, and contain the minimum or
maximum value processed for a given file's level of data
summarization. Here is an example of each:
PGMMXWSS - Program Maximum Working Set Size
DSDMNRPG - Min Free Pages in Free Chain
Derived elements are uniquely defined to the file in which
they are maintained, and contain the results of special
computations (for example, paging rate per second) that are
computed for a given file's level of data summarization.
Here is an example:
PAGPSDPG - Demand Paging Per Second
Common data elements are elements that maintain a standard
definition, even though they may appear in more than one file
in the database, They may be any of the above data element
types and may be used for data sequencing, summarization, or
selection. Here is an example:
SYSID - System Identification
Summarization of some of these depends on the definition of
the variable itself. These definitions are not known to SAS.
Consequently, using SAS summarization PROCS on CA MICS files
can result in erroneous results.
A CA MICS summarization facility takes the definitions into
account to provide proper summarization. Proper
summarization of CA MICS files, as well as the files you
create from CA MICS files, depends on the CA MICS
summarization facility.
A separate summarization facility is provided for each CA
MICS file. In creating your own files you must do any
summarization on the CA MICS files before altering
observations by merging, by creating new variables in an
observation, or by eliminating any observations through the
use of selection (with subsetting IF statements).
How the Summarization Facility is Used
--------------------------------------
The following scenario illustrates the use of the CA MICS
summarization facility:
Suppose we were interested in the total CP processor dispatch
time for a complete sysplex per month. Checking the file's
organization in the Files chapter in the Hardware and SCP
Analyzer Guide, we see that the CPU Activity File (HARCPU) is
sequenced by SYSID, YEAR, MONTH, and ZONE (in the MONTHS
timespan). Assume that this central computing complex or
sysplex has 2 LPARs (2 SYSIDs) and we want to determine the
total CPU consumption on both systems combined.
We want to create a file that is in the sequence YEAR and
MONTH with only a single observation for the whole sysplex
for each YEAR and MONTH combination. This means that we want
to summarize MONTHS.HARCPU01 on YEAR and MONTH. To do this,
we must first sort the MONTHS.HARCPU01 file in the sequence
YEAR and MONTH.
We then use the CA MICS summarization facility to summarizes
the HARCPU file and recompute all unique variables.
The sample file contains the following variables:
Variable Name Content Variable Type
------------- ------- -------------
SYSID Alphanumeric 4 digits* Common ID
YEAR Numeric 78 - 99 Common DATE
MONTH Numeric 01 - 12* Common DATE
DAY Numeric 01 - 31* Common DATE
ZONE Alphanumeric 1 digit Common TIME
HOUR Numeric 00 - 23* Common TIME
ENDTS SAS Time-Stamp Common TIME
R1 numeric Retained
R2 character Retained
A1 numeric Accumulated
M1 max value MAX/MIN/AVG
M2 min value MAX/MIN/AVG
M3 Average MAX/MIN/AVG
P1 Value< 1 Percentage
C1 numeric Computed
(Computed from A1/M1)
* In this example, we use only a single digit.
The following questions are addressed in the context of the
sample file:
o What happens to the values of variables in the sequence
list?
o What happens to values of common date and time variables
not in the sort sequence?
o Under what circumstances do variables lose their meaning
due to the summarization process?
The summary/sequence for this file is shown below:
+---------+-------------------------------------------------+
|Timespan | Level of Data Granularity |
+---------+-------------------------------------------------+
| | |
| DETAIL |SYSID ACCTNO1 ACCTNO2 ACCTNO3 JOBGROUP |
| |JOB YEAR MONTH DAY ENDTS |
| | |
| DAYS |SYSID ACCTNO1 ACCTNO2 ACCTNO3 JOBGROUP |
| |JOB YEAR MONTH DAY HOUR |
| | |
| WEEKS |N/A |
| | |
| MONTHS |SYSID ACCTNO1 ACCTNO2 ACCTNO3 JOBGROUP |
| |YEAR MONTH ZONE |
| | |
| YEARS |SYSID ACCTNO1 ACCTNO2 ACCTNO3 JOBGROUP |
| |YEAR ZONE |
| | |
+---------+-------------------------------------------------+
Since this sample file is the DETAIL timespan, the data is
not summarized. All variables are meaningful. The
observations in this file (for the DETAIL timespan) are as
follows:
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
| | S | |M | | | E| | | | | | | | | |
| O| Y |Y |O | D| H| N| Z| | | | | | | | |
| B| S |E |N | A| O| D| O| R| R| A | M | M | M | P | C |
| S| I |A |T | Y| U| T| N| 1| 2| 1 | 1 | 2 | 3 | 1 | 1 |
| #| D |R |H | | R| S| E| | | | | | | | |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79|1 | 1| 1| 1| 1| 12| A| 1000| 100| 12| 16| .6|10.0|
| 2| A |79|1 | 1| 1| 2| 1| 12| B| 2000| 100| 20| 37| .7|20.0|
| 3| A |79|1 | 1| 6| 5| 2| 13| C| 40| 80| 10| 70| .6| 0.5|
| 4| A |79|2 | 6| 9| 4| 2| 14| D| 60| 60| 10| 40| .7| 1.0|
| 5| A |79|2 | 7| 3| 1| 1| 15| E| 60| 40| 10| 20| .6| 1.5|
| 6| B |79|1 | 3| 2| 6| 1| 16| F| 90| 30| 10| 20| .7| 3.0|
| 7| B |79|2 | 4| 5| 2| 2| 17| G| 1000| 89| 12| 43| .6|11.2|
| 8| B |79|2 | 4| 5|41| 2| 18| H| 2000| 69| 12| 43| .7|29.0|
| 9| B |79|2 | 4| 6|57| 2| 19| I| 100| 99| 12| 43| .6| 1.1|
|10| B |79|2 | 4| 6|58| 2| 10| J| 100| 40| 13| 21| .7| 2.5|
|11| B |79|2 | 4| 6|59| 2| 11| J| 200| 50| 15| 25| .7| 4.0|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
Figure 5-4. DETAIL File Content
Summarizing in the DAYS Timespan
----------------------------------
Now consider how the observation variables will change when
going from DETAIL to the DAYS timespan. The DAYS timespan is
summarized to the HOUR level, and is in the same sequence as
the DETAIL timespan (except for ENDTS).
The summarization rules are as follows:
o Variables above the lowest level of summarization are
unchanged. For example, SYSID, YEAR, MONTH, and DAY are
unchanged in the summarized observations when summarizing
on the HOUR within them.
o Retained Variables - The value for the last observation in
the summary is kept.
o Common Variables - The value for the last observation in
the summary is kept.
o Common Date and Time Variables - If the variable being
summarized is higher in granularity than the one being
summarized on (ZONE vs HOUR) or (MONTH vs DAY), then the
value remains the same. If the opposite is true, then the
value is set to "missing" (except for ENDTS and STARTTS,
which are treated differently, because after summarization
they reflect the highest and lowest timestamp respectively
of the observations summarized). For example, when the
observations are to ZONE within MONTH, as is the case in
the MONTHS timespan, the variables DAY and HOUR are no
longer meaningful and are changed to a value of missing
(.).
o Accumulated - The sum of the values in the observations.
o MIN/MAX Variable - The min/max values from the
observations being combined.
o AVG - The sum of the values in the observations divided by
the number of observations being combined.
o Percent and/or Computed/Calculated - Recomputed from the
variables in the observation resulting from the
summarization operation.
Figure 5-5 shows which observations are combined from Figure
5-4 to make the observations in Figure 5-6. That is, in
Figure 5-5, the Summarized Observation Numbers are the
observation numbers in Figure 5-6, while the Input
Observation Numbers are the observation numbers in Figure
5-4.
+-------------+--------------------------+
| Summarized | |
| Observation | |
| Number | Input Observation Numbers|
|-------------+--------------------------|
| 1 | 1, 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
| 6 | 7, 8 |
| 7 | 9, 10, 11 |
+-------------+--------------------------+
Figure 5-5. Observations Summarized to DAYS Timespan
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
| | S | | M| | | E| | | | | | | | | |
| O| Y |Y | O| D| H| N| Z| | | | | | | | |
| B| S |E | N| A| O| D| O| R| R| A | M | M | M | P | C |
| S| I |A | T| Y| U| T| N| 1| 2| 1 | 1 | 2 | 3 | 1 | 1 |
| #| D |R | H| | R| S| E| | | | | | | | |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79| 1| 1| 1| 2| 1| 12| B| 3000| 100| 12| 27| .7|30.0|
| 2| A |79| 1| 1| 6| 5| 2| 13| C| 40| 80| 10| 70| .6| 0.5|
| 3| A |79| 2| 6| 9| 4| 2| 14| D| 60| 60| 10| 40| .7| 1.0|
| 4| A |79| 2| 7| 3| 1| 1| 15| E| 60| 40| 10| 20| .6| 1.5|
| 5| B |79| 1| 3| 2| 6| 1| 16| F| 90| 30| 10| 20| .7| 3.0|
| 6| B |79| 2| 4| 5|41| 2| 18| H| 3000| 89| 12| 43| .7|33.7|
| 7| B |79| 2| 4| 6|59| 2| 11| J| 400| 99| 12| 33| .7| 4.0|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
Figure 5-6. DAYS File Content
See how the DETAIL timespan has 11 observations but the DAYS
timespan has only 7. Summarization has resulted in fewer
observations but the same number of variables:
o Observations 3, 4, 5, and 6 (from the DETAIL timespan) look
exactly the same as in the DETAIL timespan. They were not
combined with any other observations because they did not
share the same values of the sequence variables in the
DETAIL timespan with any other observations.
o Observations 1 and 2 were summarized because they shared
the same values of the sequence variables. They were
"combined" into a single (new) observation.
o The values of the other variables were determined according
to the summarization rules:
- Observations 7 and 8 were combined.
- Observations 9, 10, and 11 were combined.
Summarizing in the MONTHS Timespan
----------------------------------
When summarizing in the MONTHS timespan, the sequence
variables are different: DAY has vanished and ZONE has
appeared. It is also important to know whether the DETAIL or
DAYS timespan is used to produce the MONTHS observations. It
is DAYS. (In most cases summarization uses the previous
timespan.)
Figure 5-7 shows which observations are combined from Figure
5-6 to create observations in Figure 5-8. Figure 5-8 shows
the resulting observations. Note the following:
o The DAYS and HOURS variables are no longer meaningful since
they are lower in granularity than ZONE within MONTH.
o The ENDTS variable assumes a new meaning and now represents
the highest timestamp found in the observations summarized.
+-------------+--------------------------+
| Summarized | |
| Observation | |
| Number | Input Observation Numbers|
|-------------+--------------------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6, 7 |
+-------------+--------------------------+
Figure 5-7. Observations Summarized to MONTHS Timespan
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
| | S | |M | | | E| | | | | | | | | |
| O| Y |Y |O | D| H| N| Z| | | | | | | | |
| B| S |E |N | A| O| D| O| R| R| A | M | M | M | P | C |
| S| I |A |T | Y| U| T| N| 1| 2| 1 | 1 | 2 | 3 | 1 | 1 |
| #| D |R |H | | R| S| E| | | | | | | | |
|--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----|
| 1| A |79| 1| .| .| 2| 1| 12| B| 3000| 100| 12| 27| .7|30.0|
| 2| A |79| 1| .| .| 5| 2| 13| C| 40| 80| 10| 70| .6| 0.5|
| 3| A |79| 2| .| .| 4| 2| 14| D| 60| 60| 10| 40| .7| 1.0|
| 4| A |79| 2| .| .| 1| 1| 15| E| 60| 40| 10| 20| .6| 1.5|
| 5| B |79| 1| .| .| 6| 1| 16| F| 90| 30| 10| 20| .7| 3.0|
| 6| B |79| 2| .| .|59| 2| 11| J| 3400| 99| 12| 38| .7|34.3|
+--+---+--+--+--+--+--+--+---+--+-----+----+---+---+---+----+
Figure 5-8. MONTHS File Content
Copyright © 2014 CA.
All rights reserved.
 
|
|