The System Reliability Summary Report provides a method of
tracking and analyzing the overall reliability of an entire
system, based on a single system identification (SYSID).
Data is gathered from a number of the System Reliability
(SRL) files and is summarized under the following categories:
o Processor Reliability Indicators
o Software Reliability Indicators
o Device Reliability Indicators
o Media Reliability Indicators
o Special Reliability Indicators
The objective of the report is to present data that can be
used to identify areas where problems have occurred during
the reporting period and to show the trend of failures in
specific areas over a number of days. For each indicator,
the counts of key system events or summaries of significant
error conditions are provided for one or more days. The key
system events include failures or conditions such as machine
checks, channel checks, or processor wait states. The
significant error conditions include counts such as the
number of temporary and permanent failures by device class
and the number of system software errors.
The indicators should be reviewed with respect to the
standards and procedures being followed in your installation.
The policies and procedures used in each installation clearly
affect the use and interpretation of the reliability
indicators.
PROCESSOR RELIABILITY INDICATORS
Processor reliability indicators generally reflect the status
of the processor and its associated storage and channels.
Failures in this area can result in degradations or, in the
case of a serious failure, to interruptions to the services
provided by the system.
The processor reliability indicators examine errors and
conditions which either degrade services or interrupt the
processing of the entire system. The trend over the time
period selected is a first indication of whether the system
is doing better or worse than before.
The following indicators are provided:
01.IPLS - the number of times the processor was IPLed.
Each IPL potentially represents an interruption
to service.
This count may be higher than the count from
SMF. The SMF IPL record is only written if SMF
is successfully started during the IPL process.
If an IPL occurs and the system fails or is
IPLed again before SMF is started, the
reliability value and SMF value will disagree.
IPLs scheduled for maintenance, testing, etc.
should be taken into consideration.
02.TERMINATION EVENTS - the number of times the system
went through the 'End of Day' processing.
If your installation requires that the system is
shut down in an orderly fashion, that is, that
the Z EOD command is used at all normal
shutdowns, then this value may be used with the
number of IPLs as a key reliability indicator.
If unscheduled IPLs occurred and this value is
0, then you can assume that the system has
crashed or has come down as the result of an
error.
03.PROCESSOR CHECKS - the number of times a machine
check was encountered on the processor.
04.STORAGE CHECKS - the number of times a machine check
involving processor storage was encountered.
05.CHANNEL CHECKS - the number of times a channel check
occurred.
Channel checks are all indicators of hardware
problems that either degraded or interrupted the
system. If the number of IPLs is zero or all
represent scheduled IPLs, then these errors
degraded the system.
06.I/O SUPVR WAIT STATES - the number of times that the
processor was stopped or halted by an error
during an Input/Output operation.
I/O supervisor wait states represent a system
degradation if the error was recognized as a
wait state. If the wait was not recognized,
then the system may have been IPLed,
interrupting all services.
SOFTWARE RELIABILITY INDICATORS
Software reliability indicators reflect the status of the
operating system software and the status of user software
which logs information to the system error recording data
set. Software errors can result in degradation or
interruptions to the services provided by the system.
They represent the quantity or volume of errors that have
occurred in the software. The overall size of a number is an
indication of whether further analysis of the software
failures is required.
The following indicators are provided:
01.MACHINE CHECK RELATED - the number of failures
encountered by software modules or routines
that were related to machine checks.
02.OPERATOR DETECTED - the number of failures detected
and logged as the result of an system operator
action.
03.ABENDS,PGM INTERRUPTS - the number of failures
encountered by software modules or routines
that were the result of an abend or program
interrupt.
04.LOST RECORDS - the number of records that were lost
or not recorded on the system error recording
data set.
LOST RECORDS is an indication that some
number of records could not be written to the
error recording data set. If any value appears
here, efforts should be made to determine what
type of condition caused the lost records. A
large number of channel failures, for example,
could have caused a lost record condition
because the mode of transmission was the cause
of loss.
DEVICE RELIABILITY INDICATORS
Device reliability indicators reflect the status of the
devices, by device class, attached to the processor. Device
errors can result in many different failures, depending on
the use of the device and the severity of the error.
They represent the overall reliability of the devices
attached to the system. The size of a number or error count
is an indication of whether further analysis of the device
detail information is required.
The following indicators are provided:
01.MISSING INTERRUPT EVENTS - the number of times that
an I/O interrupt has been missed or dropped by
a device.
02.RECONFIGURATION EVENTS - the number of times that a
permanent error on direct access or magnetic
tape has resulted in a dynamic device
reconfiguration or swap to an alternate device.
RECONFIGURATION EVENTS represent permanent
errors that caused a dynamic device
reconfiguration or swap to an alternate device.
One or more permanent errors should appear in
the direct access or magnetic tape values.
nn.PERMANENT ERRORS - the number of permanent errors
encountered by devices within each of the
following device classes:
03. DASD (direct access)
04. TAPE (magnetic tape)
05. TP (teleprocessing)
06. U/R (unit record)
06.PERMANENT ERRORS (U/R) is an indication of
unrecoverable errors that occurred. This is an
indication that further analysis is required.
nn.TEMPORARY ERRORS - the number of temporary errors
encountered by devices within each of the
following device classes:
07. DASD (direct access)
08. TAPE (magnetic tape)
09. TP (teleprocessing)
10. U/R (unit record)
10.TEMPORARY ERRORS (U/R) is an indication of
recoverable errors that occurred. If the
numbers are large, further analysis is required.
SPECIAL RELIABILITY INDICATORS
Special reliability indicators reflect the status of
special reliability events or errors that have occurred.
These counts generally provide a more detailed review of
indicators for specific devices attached to the system.
They represent errors and conditions that are being tracked
specifically by the installation.
The following indicator is provided:
01.LASER PRINTER ERRORS - the number of permanent or
significant errors which have occurred on laser
printer devices.
01.LASER PRINTER ERRORS represent the number of
temporary errors and permanent errors related
specifically to the laser printers attached to
the system. Large values are an indication that
further analysis is required.
INQUIRY ID:
SRLLD1
DATA SOURCE (file/timespan):
SRLDRL, SRLMRL, SRLTRL, SRLXRL and SRLRNC at the
DETAIL timespan.
DATA ELEMENTS USED:
The data elements used for this inquiry are the following:
___________________________________________________________
| | |
| FILE | DATA ELEMENTS |
|__________|_______________________________________________|
| | |
| SRLDRL | DRLPRMCT DRLTMPCT |
| SRLMRL | MRLLOGTY MRLPRMCT MRLTMPCT MRLMTS |
| SRLTRL | TRLPRMCT |
| SRLXRL | XRLPRMCT |
| SRLRNC | RNCTYPE |
| | |
|__________|_______________________________________________|
CA 09:01 THURSDAY, MAY 8, 2008 CA MICS I/S MANAGEMENT SUPPORT SYSTEM RELIABILITY SUMMARY System Identifier S008 -------------------------------------------------------------------------------------------------------------------------- | | FAILURE SUMMARY | | | |-------------------------------------------------------| | | |01MAY08|02MAY08|03MAY08|04MAY08|05MAY08|06MAY08|07MAY08| TOTAL | | |-------+-------+-------+-------+-------+-------+-------+-------| | |NO. OF |NO. OF |NO. OF |NO. OF |NO. OF |NO. OF |NO. OF |NO. OF | | |ERRORS |ERRORS |ERRORS |ERRORS |ERRORS |ERRORS |ERRORS |ERRORS | |--------------------------------------------------------+-------+-------+-------+-------+-------+-------+-------+-------| |RELIABILITY INDICATORS |FAILURE CATEGORIES | | | | | | | | | |---------------------------+----------------------------| | | | | | | | | |A. PROCESSOR |01.IPLS | 1| 2| 1| 1| 3| 3| 3| 14| | +----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |05.CHANNEL CHECKS | 12| 10| 4| 2| 3| 4| 4| 39| |---------------------------+----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| |B. SOFTWARE |03.ABENDS,PGM INTERRUPTS | 3| 15| 7| 10| 12| .| .| 47| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |04.LOST RECORDS | .| 2| 1| 4| 3| 1| 1| 12| |---------------------------+----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| |C. DEVICE |01.MISSING INTERRUPT EVENTS | 8| 1| .| .| 62| .| .| 71| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |02.RECONFIGURATION EVENTS | .| 2| 1| .| .| .| .| 3| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |03.PERMANENT ERRORS-DASD | 5| 10| 4| 4| 15| 5| 5| 48| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |04.PERMANENT ERRORS-TAPE | 5| 8| 4| 7| 12| .| .| 36| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |05.PERMANENT ERRORS-TP | 3| 63| 1| 1| 7| .| .| 75| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |06.PERMANENT ERRORS-U/R | 8| 9| 11| 6| 13| 3| .| 50| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |07.TEMPORARY ERRORS-DASD | 11| 16| 7| 6| 2| 3| .| 45| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |09.TEMPORARY ERRORS-TP | 5| 5| 9| 6| 6| .| .| 31| | |----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| | |10.TEMPORARY ERRORS-U/R | 19| 11| 10| 12| 15| 7| .| 74| |---------------------------+----------------------------+-------+-------+-------+-------+-------+-------+-------+-------| |D. SPECIAL |01.LASER PRINTER ERRORS | 19| 11| 10| 12| 15| 7| .| 74| |--------------------------------------------------------+-------+-------+-------+-------+-------+-------+-------+-------| |TOTAL ERRORS | 81| 145| 60| 60| 219| 25| 3| 593| --------------------------------------------------------------------------------------------------------------------------
Figure 3-2. System Reliability Summary Report
|
Copyright © 2014 CA.
All rights reserved.
|
|