Previous Topic: Separation of Administrative FunctionsNext Topic: Physical Security Assumptions


System Integrity

The z/OS operating system uses features of both the hardware and the software to ensure system integrity. System states are used to distinguish changes in the operating system. The two z/OS MVS system states are privileged (system) and unprivileged (user programs). The z/OS MVS operating system prevents changes in system status through the use of privileged and unprivileged states. The hardware implements and distinguishes between system and user states. You can use several different diagnostic routines to verify that your system hardware is operating correctly.

Hardware

Verify the hardware is running correctly by running microcode (firmware) and software diagnostic routines. Microcode diagnostics are resident on diskettes or internal hard disks, depending on the processor model. Microcode diagnostics are the first level of problem determination for hardware repair personnel and verify correct operation of the processors.

Error Recording

In addition, z/OS MVS maintains the SYS1.LOGREC data set for the purpose of error recording. This data set cannot be shared between systems. It provides a record of all detected hardware failures and selected software errors and system conditions. Information about each incident is written into SYS1.LOGREC by the system recording routines and can be retrieved using the environmental recording, editing, and printing service aid (IFCEREP1). The IFCEREP1 output can be used for diagnostic or measurement purposes to maintain the devices and to support the system control program.

The IFCDIP00 service aid initializes SYS1.LOGREC during system initialization. IFCDIP00 creates a header record and a time stamp record for the SYS1.LOGREC data set and allocates space for the data set that must reside on the system residence volume.

Records

A record is made on SYS1.LOGREC for every detected hardware or software failure and system condition that has an associated recording request or recording routine. The records contain different types of data that document failures and system conditions. The records are stored in chronological order on SYS1.LOGREC.

In general, each record contains:

There are various types of records, containing specific device or incident‑dependent information that can be recorded on SYS1.LOGREC, that contain complete and specific information for the device, and type of failure or system condition that caused it to be written.

Machine Failures

Recording machine check records are recorded on SYS1.LOGREC whenever the following detected machine failures occur:

When a machine failure occurs, the Machine Check Handler (MCH) receives control through a machine-check interrupt for a soft failure (one that was corrected by the hardware retry features) or for a hard failure (one that could not be corrected by the retry features). If the machine check interrupt is for a soft failure, MCH uses the environmental and model independent information describing the failure to build an MCH record. After the information is formatted, MCH passes control to the Recovery Termination Manager (RTM). RTM then invokes the recording request routine that queues the MCH record on the asynchronous output queue and posts the asynchronous recording task. The recording task asynchronously scans the output queue and issues an appropriate SVC to write any records on the queue to SYS1.LOGREC.

If the machine check interrupt is for a hard failure, MCH analyzes the information in the model independent logout area, isolates the error, and provides a record of the analysis to RTM. RTM then takes the same actions as it does for a soft failure.

With each initial program load (IPL), the system begins a sequential count of errors. The sequence number is unique for each detected software error or machine failure. The sequence number remains constant for subsequent software records associated with the same error (although the time stamp may change). Software records are recorded on SYS1.LOGREC for hardware detected hardware errors, hardware detected software errors, operator detected errors, and software detected software errors. For error recording purposes, error data is collected in the System Diagnostic Work Area (SDWA) to assist in identifying the System Control Program (SCP) error and then invoke the RTM.