Previous Topic: RMFWCL - CPU MF Workload Characterization LPAR Level

Next Topic: Uses

Overview

The RMFWCL query produces charts that show how efficiently LPAR workloads make use of the cache and memory hierarchy of IBM z/10 and later generation Central Processing Complexes (CPC). Seven separate chart views are produced from each of the two CSV output files. One CSV output file is used to generate charts for LPARs executing on IBM z/10 CPCs, and the other for LPARs executing on IBM z/114 and z/196 CPCs. The seven chart views in each CSV output file provide similar information for each CPC type, but the specific data element values charted are different because the cache architecture of the z/10 CPC is different from that of the z/114 and z/196 CPCs. IBM’s CPU Measurement Facility collects information about LPAR logical IPU (Individual Processor Unit) interaction with the CPC cache architecture in two formats, Version 1 for the z/10 and Version 2 for the z/114 and z/196.

The two data extract CSV’s generated by the RMFWCL query are:

Each data extract generates the following seven char t views:

  1. LPAR Level – RNI Workload Hint and L1 Cache Miss Pct
  2. LPAR Level – L1 Cache Miss Sourcing Percentages
  3. LPAR Level – L1 Data Cache Miss Sourcing Percentages
  4. LPAR Level – L1 Instruction Cache Miss Sourcing Percentages
  5. LPAR Level – Penalty Cycles and Avg Penalty per L1 Miss
  6. LPAR Level – Realized MIPS and Avg IPU Count
  7. LPAR Level – Instruction Count and CPI

Chart view #1 displays the “Workload Hint”, Relative Nest Intensity (RNI), and Level 1 cache miss percentage. The Workload Hint is derived from an IBM formula that uses the RNI and Level 1 cache miss percentages to classify workloads as ‘LOW’, ‘MEDIUM’, and ‘HIGH.’ This workload classification methodology is required to accurately predict how workloads behave when upgrading from z/10 or later CPC models. The first chart sample above was produced from a single cycle of the DETAIL timespan HARCML file. This chart, for LPAR ‘SY06’ running on an IBM 2097-734 (z/10CPC with 34 CP engines), shows that from about 7 AM through 9 PM, the workload hint was ‘MEDIUM’, and from about 9 PM through 7 AM, the workload hint was ‘LOW.’ As the workload hint value moves from ‘LOW’ to ‘HIGH’, the workloads experience more Level 1 cache misses, and/or when retrieving the required item, find it in the slower cache levels and memory.

Chart view #2 uses a stacked area chart to show the percentages of where data or instructions were retrieved, when not found in the Level 1 Cache. In addition, the Level 1 cache miss percentage and average LPAR IPU busy are shown. The stacked area percentages are displayed with the least expensive (from a resource utilization perspective) retrieval source at the bottom to the most expensive (remote memory) at the top. In the example above, generated for the same LPAR as the first example, you can see that workloads running in the early morning (midnight to 7 AM), and late night (9 PM to midnight) find data and instructions in the z/10 Level 1.5 cache more frequently than workloads running from 7 AM to 9 PM when more Level 1 cache misses occur.

Chart views #3 and #4 show the same information as the second view, but for Level 1 data cache misses and instruction cache misses, respectively. When examining these chart views, you will see that instruction cache misses tend to be resolved less expensively than data cache misses.

Chart view #5 displays the overall average number of penalty cycles incurred per Level 1 cache miss, as well as the average penalty cycles incurred for Level 1 misses in the data and instruction caches. Penalty cycles are wasted machine cycles that occur while the CPC is working to retrieve a datum or instruction from the cache/memory hierarchy.

Chart view #6 shows the average number of active logical IPUs (engines) for the LPAR, and the actual Millions of Instructions per Second (MIPS) rate realized by the logical engines while dispatched on the CPC.

Chart view #7 shows the number of problem and supervisor state instructions executed, the average number of cycles per instruction for each of these states, as well as the overall CPI rate.

By default, the query is executed against the DETAIL timespan HARCML file, and each X axis value represents data from an RMF interval. The X axis metric, DATE HOUR MINUTE, is constructed by concatenating the DATE, HOUR, and MINUTE from the end time boundary of each interval.