Previous Topic: RMFWCL - CPU MF Workload Characterization LPAR LevelNext Topic: Uses


Overview

The RMFWCL query produces charts that show how efficiently LPAR workloads make use of the cache and memory hierarchy of IBM z10 and later generation Central Processing Complexes (CPC). Seven separate chart views are produced from each of the two CSV output files. One CSV output file is used to generate charts for LPARs executing on IBM z10 CPCs, and the other for LPARs executing on IBM z114 and z196 CPCs. The seven chart views in each CSV output file provide similar information for each CPC type, but the specific data element values charted are different because the cache architecture of different CPC models is different. IBM’s CPU Measurement Facility collects information about LPAR logical IPU (Individual Processor Unit) interaction with the CPC cache architecture in three formats, Version 1 for the z10, Version 2 for the z114 and z196, and Version 3 for the zEC12 CPCs.

The three data extract CSV’s generated by the RMFWCL query are:

Each data extract generates the following seven chart views:

  1. LPAR Level – RNI Workload Hint and L1 Cache Miss Pct
  2. LPAR Level – L1 Cache Miss Sourcing Percentages
  3. LPAR Level – L1 Data Cache Miss Sourcing Percentages
  4. LPAR Level – L1 Instruction Cache Miss Sourcing Percentages
  5. LPAR Level – Penalty Cycles and Avg Penalty per L1 Miss
  6. LPAR Level – Realized MIPS and Avg IPU Count
  7. LPAR Level – Instruction Count and CPI

Chart view #1 displays the “Workload Hint”, Relative Nest Intensity (RNI), and Level 1 cache miss percentage. The Workload Hint is derived from an IBM formula that uses the RNI and Level 1 cache miss percentages to classify workloads as ‘LOW’, ‘MEDIUM’, and ‘HIGH.’ This workload classification methodology is required to accurately predict how workloads behave when upgrading from z10 or later CPC models. The first chart sample above was produced from a single cycle of the DETAIL timespan HARCML file. This chart, for LPAR ‘SY06’ running on an IBM 2097-734 (z10CPC with 34 CP engines), shows that from about 7 AM through 9 PM, the workload hint was ‘MEDIUM’, and from about 9 PM through 7 AM, the workload hint was ‘LOW.’ As the workload hint value moves from ‘LOW’ to ‘HIGH’, the workloads experience more Level 1 cache misses, and/or when retrieving the required item, find it in the slower cache levels and memory.

Chart view #2 uses a stacked area chart to show the percentages of where data or instructions were retrieved, when not found in the Level 1 Cache. In addition, the Level 1 cache miss percentage and average LPAR IPU busy are shown. The stacked area percentages are displayed with the least expensive (from a resource utilization perspective) retrieval source at the bottom to the most expensive (remote memory) at the top. In the example above, generated for the same LPAR as the first example, you can see that workloads running in the early morning (midnight to 7 AM), and late night (9 PM to midnight) find data and instructions in the z10 Level 1.5 cache more frequently than workloads running from 7 AM to 9 PM when more Level 1 cache misses occur.

Chart views #3 and #4 show the same information as the second view, but for Level 1 data cache misses and instruction cache misses, respectively. When examining these chart views, you will see that instruction cache misses tend to be resolved less expensively than data cache misses.

Chart view #5 displays the overall average number of penalty cycles incurred per Level 1 cache miss, as well as the average penalty cycles incurred for Level 1 misses in the data and instruction caches. Penalty cycles are wasted machine cycles that occur while the CPC is working to retrieve a datum or instruction from the cache/memory hierarchy.

Chart view #6 shows the average number of active logical IPUs (engines) for the LPAR, and the actual Millions of Instructions per Second (MIPS) rate realized by the logical engines while dispatched on the CPC.

Chart view #7 shows the number of problem and supervisor state instructions executed, the average number of cycles per instruction for each of these states, as well as the overall CPI rate.

By default, the query is executed against the DETAIL timespan HARCML file, and each X axis value represents data from an RMF interval. The X axis metric, DATE HOUR MINUTE, is constructed by concatenating the DATE, HOUR, and MINUTE from the end time boundary of each interval.