Previous Topic: Monitoring Your EnterpriseNext Topic: Host Resources MIB


Using Agent Technology to Monitor Resources

To facilitate comprehensive and integrated network polling and administration, CA NSM uses Agent Technology to automate manager tasks and responses to events. Agent Technology monitors and reports the status of your resources and applications and lets you manage those resources (also called managed objects). The status of a resource is displayed within the Management Command Center and on the WorldView 2D Map.

This chapter explains the basic functions of four aspects of monitoring your enterprise:

Understanding Unicenter Remote Monitoring

To determine when and how to use Remote Monitoring effectively, you need to understand both the advantages and disadvantages of deploying this non-intrusive monitoring technology.

Advantages of Remote Monitoring include the following:
Disadvantages of Remote Monitoring include the following:

Note: For more information about Remote Monitoring, see CA NSM - Remote Monitoring online help.

Remote Monitoring Architecture

This topic explains the Remote Monitoring architecture, which will help you determine if Remote Monitoring Agent is best suited to monitor your environment.

Remote Monitoring consists of the following three major components:

The following diagram illustrates how these components work together:

URM diagram: URM agents talk to MDB, WorldView, and Event Management

Resource Types You Can Monitor

Remote Monitoring lets you monitor multiple platforms and resource types throughout your network. The following table lists all operating systems, the versions currently supported, and the type of information you can monitor for each:

Operating System

Versions

Information Types Monitored

Windows

  • 2000 Professional, Server, Advanced Server, Datacenter (Intel x86)
  • 2003 Standard Server, Datacenter,
    Enterprise Server, Small Business Server (Intel x86, AMD-64, EM64-T, IA-64)
  • 2003 R2 Standard, Enterprise, Datacenter (Intel x86, AMD-64, EM64-T, IA-64)
  • XP Professional (Intel x86, AMD-64, EM64-T)
  • Windows Vista Business, Enterprise, Ultimate (Intel x86, AMD-64, EM64-T, IA-64)
  • Windows Server 2008 (Intel x86, AMD-64, EM64-T, IA-64)

Event logs

Services

System Metrics

Detailed Metrics

Registry Keys

AIX

  • 5.2 (POWER)
  • 5.3 (POWER)

System Metrics

Detailed Metrics

FreeBSD

  • 6.2 (Intel x86)

System Metrics

Detailed Metrics

HP-UX

  • 11iv1 (PA-Risc-64)
  • 11.23 (PA-Risc-64, IA-64)
  • 11.31 (PA-Risc-64, IA-64)

System Metrics

Detailed Metrics

Linux

  • Red Hat 4.0
    (Intel x86, AMD-64, EM64-T, IA-64, S/390)
  • Red Hat 5.0
    (Intel x86, AMD-64, EM64-T, IA-64, S/390)
  • SLES 9
    (Intel x86, AMD-64, EM64-T, IA-64, S/390)
  • SLES 10
    (Intel x86, AMD-64, EM64-T, IA-64, S/390)

System Metrics

Detailed Metrics

Mac OS X

  • 10.2 (PPC)
  • 10.3 (PPC)
  • 10.4 (Intel, PPC)
  • 10.5 (Intel, PPC)

System Metrics

Detailed Metrics

Solaris

  • 8 (UltraSPARC)
  • 9 (UltraSPARC)
  • 10(UltraSPARC, Intel x86, AMD-64, EM64-T)

System Metrics

Detailed Metrics

Tru64

  • 5.1b (Alpha)

System Metrics

Detailed Metrics

In addition to monitoring these platforms, Remote Monitoring provides IP resource monitoring. This type of monitoring lets you gather the following information:

State

Indicates whether the system is responding.

Response time

Determines whether the response time is reasonable.

State of selected ports

Issues an alarm based on a state change, such as a port that is responding when it should be turned. off (not responding).

Securing Access to Remote Monitoring

By default, all users are given full access to the Remote Monitoring features upon opening the application, and no login is required. However, you may have users who do not need to make configuration changes, but only need to monitor the status of your resources using the Remote Monitoring Administrative Interface. In this case, you can implement a role-based security scheme so that only administrators can access and change your monitoring configurations.

This role-based security access is an optional feature that provides the following two levels of security:

Administrator

Provides full access to the application.

User

Limits access to viewing the resource status information.

To implement this type of security, define one or more administrator accounts. Defining an administrator account puts the role-based security scheme into effect, and this security stays in effect as long as at least one administrator account is defined.

When this security is in effect, the default role is User. This means that upon opening the application, all configuration editing features are disabled. To gain administrative rights to the application, administrators must explicitly log in to the application, using the account you have defined. Upon successful login, the administrator is given full access to the application.

Understanding Resource Monitoring

To determine when and how to use Resource Monitoring effectively, you need to understand what this monitoring technology can do for you.

Basic Concepts

After startup, the system agents immediately start monitoring the system resources based on a predefined configuration. Lists of available (auto-discovered) system resources let you easily customize your agents during runtime to meet the specific requirements of your system.

An agent monitors system resources on the base of watchers. A watcher is the term used for any instance of a monitored resource that has been added into the agent's configuration. The agent evaluates the status of a specific resource according to the assigned watcher configuration.

To prevent losing a change in its configuration, for example, as a result of a power failure, the agent writes back its configuration data periodically. The duration of this period can be specified with the start command of the agent.

Some of the system agents support Auto Discovery. For some specific resource groups the corresponding agent adds watchers into its configuration automatically by applying filter conditions to the available lists. The agent uses the default values from the MIB to specify the properties of these watchers.

General Functions

Most of the system agents support the general functions listed in the following sections. The descriptions in this section provide a brief overview. For further details, procedures, and examples, see the corresponding references.

Auto Watchers and Available Lists

At startup the agent automatically discovers the system for monitored resources, but it depends on the type of the resource, whether the agent automatically creates a watcher for it or not. If a resource type appears in the form of only a few instances that shall always be monitored, it may be suitable for customers that corresponding watchers are automatically created (for example: CPU, Network Interfaces).

However, if a resource type appears in the form of many instances, for example file systems on UNIX servers; you may want to specify a particular subset of these instances that shall be monitored by the agent. For this case the agent does not create watchers automatically, but creates a list of the available objects (instances) of a resource type that can potentially be monitored.

Based on filter conditions of the available list you can specify a set of instances that you want to monitor and define an auto watcher for this set. Then, the auto watcher automatically creates individual watchers for those instances that match the filter condition. For example, you can specify a filter condition for the mount devices of the file systems and create an auto watcher for swap file systems only. Such an auto watcher creates individual watchers for each available swap file system on that server.

For monitoring files and processes the agent provides one-to-many watchers instead of auto watchers to monitor a specific set of instances by a single watcher. If the status of this set changes to warning or critical, the agent creates a culprit list that contains all monitored instances that caused the status change.

For example, you can specify a filter condition for the process path to monitor all processes that belong to c:\Windows\system32 by a single watcher. In the case of a Down status the agent creates a list of items (process-ID:utilization value), which identifies the processes that caused this status. The sort order and length of this list depends on the severity of the violation, for example: 408:222|409:333|475:444

Call-Back Mechanism

The call-back mechanism of system agents enables you to assign an automated task or action to a particular event within the agent layer of the CA NSM architecture. This assignment is accomplished by means of a call-back reference which can be set up for each functional area of the agent, such as one call-back reference for CPU, one call-back reference for logical volumes, one call-back reference for files, and so on.

These call-back references can only be defined in an agent's call-back configuration file (for example: caiUxsA2.cbc) that can be secured by access rights. This configuration file is stored in the Install_Path/SharedComponents/ccs/atech/agents/config/cbc directory. It contains an entry for each call-back reference, and associates with this reference the full path and name of the script or application to run. Additionally, parameter information can be passed to the script or application, as well as a user ID that should be used to execute the script or application.

The advantage of using this additional level of indirection or call-back reference is that the name of this reference can be safely shown in the MIB without causing any security exposure, because the actual path and name of the call-back script or application is hidden within a secured file. This reference also enables you to remotely check in a secure way if a call-back reference has been configured for the respective monitored area.

Note: In the MIB the call-back reference name is defined as read-only. Therefore it cannot be set or modified by Agent View or the MIB Browser. The reference name can only be configured through a definition in a configuration set.

To provide improved functionality, you can specify that the agent will pass a set of predefined or user-defined parameters to the call-back script or application upon instigation. These predefined parameters will contain the following information:

By passing these parameters to the call-back script or application, it will enable you to build powerful scripts. These scripts can perform different actions depending on the state of the monitored resource.

Cluster Awareness

Basically, support of monitoring clusters with CA NSM system agents is based on the CA High Availability Service (HAS). HAS is a set of extensions to Unicenter which enables Unicenter components to operate within a cluster environment, to function as highly available components, and to failover between cluster nodes gracefully. The system agents (caiUxsA2, caiWinA3, caiLogA2) use CA HAS and are cluster aware. This means even though those agents are running multiple times within the cluster (on each physical cluster node) only one agent monitors a shared cluster resource such as a shared disk.

No specific configuration is required for using these agents in a cluster, except for monitoring processes. The appropriate name of the cluster resource group (cluster service) must be specified when creating a process watcher.

Note: For more information, see the section Cluster Awareness and the appendix "High Availability Service" in the Inside Systems Monitoring guide, and the appendix "Making Components Cluster Aware and Highly Available" in the Implementation Guide.

Configuring Resource Auto Discovery

Configurable resource auto discovery eases implementation phases, reduces the need for manual configuration, and discovers new resources dynamically, as they become available. An additional configuration group filter attribute serves as the criteria for an automatic resource detection and watcher creation mechanism.

Editing Watchers

All the watchers of the system agents are editable. No watchers have to be removed and then re-added. If attributes of a watcher (for example, thresholds) are modified, the status of the watcher will be re-evaluated based on the current poll values. Therefore, modifying a watcher does not invoke polling.

Evaluation Policy

For analog metrics of one-to-many watchers there are several possibilities to calculate the metric value. An evaluation policy makes this evaluation watcher-specific. If the result violates the monitoring conditions, a culprit list is determined. The form of the culprit list depends on the evaluation policy setting and different kinds of thresholds (rising/declining) or minimum/maximum ranges.

The supported evaluation policies are: sum, max, min, average, and individual.

Generic Resources Monitoring

The UNIX System Agent and the Windows System Agent provide the generic resource monitoring concept that lets you extend the monitoring capabilities of Hardware monitoring and Programmable Resources monitoring by using external scripts or programs. These scripts must be “registered” in the Generic.ini file and have to provide a special output format for the evaluated data.

History Group

The History Table lists the last n enterprise-specific status traps the agent raised. The value of n is a configurable attribute in the history group (<xyz>HistoryMaxEntries). Setting this value to 0 causes the agent not to store any trap history.

The trap history collection can be switched on and off on a per resource group basis. This feature is especially useful, if toggling watchers cause the trap history table to be filled again and again.

Independent Warning and Critical Thresholds

The system agents allow warning and critical thresholds to be set independently for all relevant functional areas.

Loss and Existence

For the most resource groups the system agents offer a status, which reports the loss or the existence of the resource from the watcher's point of view. The watcher reports a resource as lost or nonexistent, if it is unable to access the resource.

Beside the physical loss of monitored system resources, a logical loss has to be considered. For example: print queues can be unavailable for various reasons. The UNIX System Agent implements configurable logical and physical loss status monitoring. The propagation and evaluation of detected resource outages can be fine-tuned on a per instance basis.

Message and Action Records

For many system agents the CA NSM r11.2 DVD ships files that contain definitions of all possible Event Message records as well as Action records. This considerably simplifies the creation of customer specific evaluations for the NSM event console.

Furthermore the CA NSM AEC component provides predefined correlation rules for the CA NSM r11.2 system agents.

Minimum and Maximum Metrics

Minimum/Maximum metrics are binary metrics. They are used to monitor resources which have quantity characteristics that should stay within a specific interval. The agent provides two forms of minimum and a maximum metrics:

Standard

This type provides a minimum and a maximum threshold (monitoring condition) and a monitoring level to determine the status of the resource. Detected resource values, which are greater than the minimum threshold and less than the maximum threshold, or which are equal to the minimum or maximum threshold, define the Up status for this metric. All other values define the down status.

Extended

This type provides a minimum and a maximum range which are monitored through critical and warning thresholds leading to effectively four threshold borders:

CritMin <= WarnMin <= WarnMax <= CritMax

The logic of the metric can be changed by using additional policies, for example, the evaluation policy.

Modification Policy

Files and directories can be monitored for being modified or unmodified. In both cases the dates of the corresponding files are used, that is, the file or files addressed by a file watcher or the entries in a directory including the directory itself (.) and all subentries if the recursive option is set.

Overall Status of Each Functional Area

The system agents enable the Agent View (Abrowser) to propagate the most severe state of resources reported on the resource type specific windows to the Status Summary window. The Status Summary window summarizes the status of all monitored resources. It also displays the total number of monitored resources for each object type and the overall status according to the agent.

Overloading Thresholds

In most cases, you define thresholds as percentages, but sometimes it is useful to define absolute values instead. Percentages are suitable where a high degree of resolution is not required. Additionally, they can provide generic values across many machines. Absolute values enable a far higher resolution. The overloaded thresholds concept lets you configure thresholds with the following scales:

The agent will always convert the overloaded value entered by the client into an absolute used value and store this value in the MIB. This value is used for validation and status checks. The overloading must be the same for warning and critical thresholds. Not all kind of overloading is possible by all thresholds. For details see the MIB description.

Through MIB Browser, the manner in which the client distinguishes the type of overload is by appending the percent (%) sign or F symbol to the value. In Agent View, this translation is performed dynamically, using slider widgets and graphical controls.

Periodic Configuration Write-Back

The system agents perform periodic configuration stores. To minimize overhead, an appropriate concept ensures that only configuration information that has changed since the last store operation is written back. If the system is being closed down, only recent configuration changes need to be stored, rather than the entire configuration.

Poll Method

For each resource group the agent provides a method, which lets you disable the polling of any metric for that group completely. You can allow polling only triggered by the poll interval or allow polling also by a query. This property can be used to save performance in the agent.

Resource Monitoring at an Instance Level

The system agents allow individual object instances to be monitored for all relevant functional areas.

Resource Selection Capabilities

The system agents simplify the definition of new watchers by implementing a selection or available list from which the administrator can choose the specific resource they wish to monitor. The list will be generated, on demand, as per user-defined filter criteria.

Status Deltas

For resources whose growth can consume finite resources on the machine (such as data files, and so forth), the concept of delta monitoring has been employed where feasible. This allows the agent to record the difference between the size of the resource during the last polling interval, and the size of the resource returned by the current poll. If this difference exceeds a client-defined threshold, an alert is issued. As a monitored object such as a file can contract as well as expand, it is also possible to calculate a negative value for a delta. The delta reported by the agent is always a positive or negative integer that simply reflects either the factor of growth or contraction of the resource. In the case of overloading the delta value may appear as a decimal value, for example: 99.86%.

To allow you greater flexibility when configuring the delta watchers, a type of overloading is implemented. This allows you to specify a threshold for growth, shrinkage or change in both directions. In addition to this it is possible to use the percentage type of overloading as well. You can define thresholds in the following formats:

n- absolute shrinkage

n+ absolute growth

n absolute change in both directions

n%- percentage shrinkage

n%+ percentage growth

n% percentage change in both directions

The threshold will always be entered as a positive value even if it is used to threshold against shrinkage. The actual delta value stored in the MIB is a positive or negative value to indicate the change as growth or shrinkage.

Status Lags

To provide meaningful monitoring for resources that can peak for a very short period without a problem occurring, the agent can be configured to check for several threshold breaches before the state changes. This is configured by lag attributes. The lag specifies the number of consecutive threshold (b)reaches on which state changes. If the lag is set to one then the status behaves as if there is no lag. If the lag is set to two then the threshold needs to be (b)reached twice in a row to change the state.

The agent offers an aggregate lag attribute for all resources having an aggregate status. This lag defines the number of consecutive poll intervals on which any status of the monitored resource is not in the OK or Up state, before the aggregate status changes.

SNMPv3 Support

SNMPv3 support is encapsulated in aws_sadmin. CA NSM r11, r11.1, and r11.2 system agents support SNMPv1 or SNMPv3, depending on an aws_sadmin configuration option.

Traps with Total Values

The warning and critical values in the traps are absolute values even if you have percentage thresholds defined. Without a total value you are unable to judge the scale. For this reason the total value is added to the status and info traps.

Watcher

An agent monitors IT resources on the base of watchers. A watcher is the term used for any instance of a monitored resource that has been added into the agent's configuration. The agent evaluates the status of a specific resource according to the assigned watcher configuration.

Usually a watcher consists of a set of metrics which are used to compare the detected values of monitored resources with monitoring conditions by considering settable monitoring levels. The result of this comparison is the status of the monitored resource according to the metric settings. The status of the watcher is the worst case aggregate of all associated resource statuses. If the aggregate status of a watcher changes, an info-trap can be sent to the manager. The info-trap contains information about the monitored resource that caused the status change.

Two basic watcher types can be distinguished:

One-to-one watcher

A watcher is mapped to a single resource that shall be monitored. Characteristics of the monitored resource are evaluated by appropriate metrics. For example, a file system is monitored by a single watcher and different metrics are used to detect the status of file system characteristics such as size.

One-to-many watcher

A watcher is mapped to a set of resources (instances) that shall be monitored. Common characteristics of these instances are evaluated by appropriate metrics. Unlike the one-to-one watcher a culprit list is provided to identify those instances that cause a status change of the watcher. Additionally, an evaluation policy defines for one-to-many watchers, how metric values, statuses total values and culprit lists of monitored instances are calculated. For example, processes or files can be monitored by one-to-many watchers.

Monitoring System Resources

This section describes the resources that can be monitored by system agents.

Active Directory Resources

Active Directory Management provides an enterprise-wide view of your Active Directory environment and supports the Active Directory Knowledge Base.

The Active Directory Explorer (ADE) is part of Active Directory Management. It is the main user interface for monitoring the Active Directory environment. ADE provides an instant view of the aggregated states of your forests, domains, sites, domain controllers, site links, and subnets. It lets you drill down into any of these components, providing a highly detailed enterprise and component-level view of your Active Directory environment's behavior.

Active Directory Management consists of the following components:

Active Directory Enterprise Manager

The Active Directory Enterprise Manager creates and maintains all Active Directory objects according to the following enterprise-wide Active Directory resources it monitors:

The Active Directory Enterprise Manager queries the Active Directory for information about these resources. Additionally, it polls the Active Directory Agents on all monitored domain controllers in all forests for domain controller-specific metrics and statuses.

The Active Directory Enterprise Manager analyzes the information it gathers from enterprise-wide Active Directory resources and displays it through Active Directory Explorer. Based on this information it provides an enterprise-wide view of your Active Directory resources.

Active Directory Agent

The Active Directory Services Agent can run on any Windows 2000 server platform or higher if the system is a member of an Active Directory domain. However, the complete monitoring capabilities offered by the agent are only available on a system that is defined as an Active Directory domain controller and as a DNS server. On other systems within an Active Directory Services domain, information on disk space resources, on extended resources, and on one or more performance resources is not available.

The Active Directory Agent monitors the following critical areas:

Note: When you install the agent on a member server, only the subset of the previously listed resources pertinent to all servers is available for monitoring.

CICS Resources

The CICS Agent provides status, event, and configuration information about a CICS region and the transactions that are executed within it. The agent enables you to monitor the key resources, such as DSA and memory, of your CICS regions. The agent can monitor individual resources as well as the "health" of an entire region, allowing you to quickly determine the cause of a problem.

The CICS Agent puts you in control by allowing you to determine the warning and critical thresholds for each monitored resource. The agent monitors these resources and, whenever a user-defined threshold is exceeded, sends an SNMP trap.

The CICS Agent runs in IPv6 environments.