Previous Topic: The Knowledge BaseNext Topic: Components of Rules


Investigate Rule Firing

The factory rules embedded in the Performance Manager follow the methodology in the HP's OpenVMS Performance Management guide. Some rules go beyond this methodology.

Keep in mind that, while the Performance Manager alerts you to potential performance problems, it does not (by default) screen out firings that are insignificant. Some circumstances on your system might fire a rule, for a one-time transient condition, without any implications for the long-term performance of your system. The factory rules try to minimize this possibility by setting occurrence thresholds within rules, but this might still happen.

Figure out whether you need to be concerned about a particular rule firing. Check the evidence provided with each rule. Does the problem appear to be persistent (as indicated by many lines of evidence)? Check the time of occurrence: does this happen at the same time as a regularly scheduled job? Make sure you understand the meaning of each data item presented. If you are unsure of a definition, look it up in the Data Cell Types and Use section, or use the online Help system, which contains hotspots to all the data items in the Conditions and in the Evidence. Many of the data items presented as evidence are used in the rule's conditions: read the conditions, and understand how each data item is being used, and what values would cause it to fire.

You are probably very familiar with approximate or typical values for many parameters of your systems. If a rule fires, but all related evidence appears normal for your environment, then you probably want to adjust that rule's thresholds to reflect more precisely the upper bounds of performance for your particular workloads and systems. For more information on how to do this, see How to Implement Changes in this chapter.

However, if the evidence presented does not look normal, or if you don't know what normal looks like for those data items, you need to investigate further. One good way is to generate graphs of those or related data items for the time periods given in the evidence, and then look for unusual spikes of activity. If you find spikes occurring at the same time in different graphs, then you might suspect that the data items graphed are somehow related to the underlying problem, which might lead you to ideas of what might be changed to fix the problem.

Another path of investigation is to look for related rule firings. A rule might not present the full picture by itself, but when coupled with related rules, could give you a clear indication of the source of the problem. Related rules need not appear one after the other, as Performance Manager processes rules in the following order: CPU, memory, I/O, resource (miscellaneous), and (after all nodes have been analyzed) cluster-wide. (You do not need to have a cluster for one of these rules to fire, but the metrics used are not specific to an individual node, so they are evaluated after all node-specific rules.) Rules which require special evidence processing (as indicated by the absence of evidence data items in the rules source file) are presented first.

Finally, if you observe a persistent performance problem, you can use the Real-time Motif displays to investigate dynamically‑see the chapters Use the DECwindows Motif Real-time Display and Customize the DECwindows Motif Real-time Display. This new functionality supports progressive disclosure, so that you can start monitoring with high-level system displays, and then progressively launch panels to focus on perceived problem areas, as they are occurring.