Previous Topic: Recovery Procedures After a Warmstart FailureNext Topic: Recovery Procedures from Journal File I/O Errors


Recovery Procedures from Database File I/O Errors

What an I/O Error Means

An I/O error occurring on a database file indicates that an error occurred trying to read or write to the file. This may be caused by hardware malfunctions such as a channel problem, which if corrected, means that no recovery operation is needed. An I/O error can also be caused by a physically damaged file or disk device; this type of error requires recovery of the file.

Identifying a Database File I/O Error

When CA IDMS/DB encounters an I/O error in a database file, the following events occur:

  1. CA IDMS/DB issues one of the following messages:
  2. The transaction abends with a code of 3010 or 3011.
  3. CA IDMS/DB performs automatic recovery processing.

If Recovery is Successful

If the recovery process is successful, CA IDMS/DB continues processing. To fix the I/O error, you must follow these steps:

Action

Statement

Take the area(s) associated with the bad database file offline

DCMT VARY AREA with the OFFLINE option

Identify the problem and fix it. If the problem is not associated with the database file itself (for example, the problem is due to a bad channel), perform step 3 after the problem is corrected; if the problem is due to a damaged file, perform the steps outlined for an unsuccessful recovery.

 

Bring the area(s) associated with the database file online

DCMT VARY AREA with the ONLINE option

If the Recovery is Unsuccessful

If the recovery process is unsuccessful, CA IDMS/DB suspends the transaction and issues the following message:

DC205009 TRANSACTION SUSPENDED. TRANSACTION ID: transaction-id

When CA IDMS/DB issues this message, quiesce the area in which the problem occurred as quickly as possible to prevent additional transactions from readying the area. The following table identifies all the steps:

Action

Statement

Quiesce the affected area (see Considerations in this section)

DCMT VARY AREA with the TRANSIENT RETRIEVAL or OFFLINE options

Switch to a new journal file

DCMT VARY JOURNAL

De-allocate the file

DCMT VARY FILE with the DEALLOCATE option; use the FORCE option if the file cannot be closed (for example, because of a channel problem)

Restore a copy of the damaged file using the last backup tape as input. If the FORCE option was used in step 3, recreate the file with a new name

RESTORE with the FILE option

Rollforward the restored copy of the file using the archive journal files in the order they were created

Various. See 21.5, “Manual Recovery"

If the file was restored to a new location:

  • Recatalog it in z/OS
  • Update the standard labels in z/VSE

Operating system facilities

If the file was renamed in z/OS or z/VM, change its dataset name

DCMT VARY FILE with the DSNAME option

Make the new file available to the central version

DCMT VARY FILE with the ALLOCATE option

Re-activate the suspended transactions so they complete automatic recovery

DCMT VARY FILE with the ACTIVE option

Re-activate the area for update processing

  • If the area was varied OFFLINE, issue DCMT VARY AREA with the ONLINE option
  • If the area was varied to TRANSIENT RETRIEVAL mode, first vary it OFFLINE and then ONLINE

Considerations

Quiescing the Area

Quiesce the area by varying it offline or retrieval. The differences are as follows:

If the area to be recovered is a system area, it may be necessary to terminate predefined system run units by issuing a DCMT VARY RUN UNIT ... OFFLINE command to quiesce activity to the area. It is advisable to vary the status of a system area to transient retrieval rather than offline.

In a data sharing environment, it is important to quiesce a shared area in all members of the data sharing group. The broadcast capability of DCMT commands can be used to do this easily.

Renaming the File

If you restored the file under a new name, you must make sure that the correct file is used the next time the system is started. If change tracking is in effect for the DC/UCF system, CA IDMS automatically ensures that the correct file is used when the system is restarted following an abnormal termination. However, if change tracking is not in use or if you shut down the system, you must do one of the following:

If you fail to do one of the above, CA IDMS/DB will attempt to access the wrong file the next time the system is started. This may have serious consequences if the original file still exists.

More Information

Use of Deallocate Force

If the damaged file was de-allocated using the FORCE option, the DC/UCF system marks the file as closed and de-allocated but does not actually issue the corresponding operating system requests. For this reason, you must restore the file under a different dataset name. When the DC/UCF system is eventually shutdown, it will not shutdown successfully because the operating system will attempt to close the original file. This will either cause an abend or the DC/UCF system will hang. In either case, examine the messages produced on the log. If the following message appears, the database system has completed processing and no additional action is required:

DC200010 CA IDMS/DB Inactive

If this message does not appear, you should restart the system (after taking appropriate steps such as renaming the file) and then shut it down.

Correcting the Lock Option of an Area and File

If the area associated with a damaged database file is in retrieval mode or offline and the file was restored with the area lock on, then the area status is incompatible with the file status. If you try to vary the area online, IDMS responds with an error. To correct this situation, issue a DCMT VARY AREA command with the UPDATE LOCKED option. This command allows IDMS to vary the area to an update mode even though the file is locked.

InDoubt Transaction Considerations

No special action regarding InDoubt transactions should be necessary, since they will complete once the file is varied active and resynchronization takes place with the coordinator.