Previous Topic: Data SecurityNext Topic: Recovering from Catastrophic Failure


Recovering from Non-Catastrophic Failure

If an interactive program crashes due to a non-catastrophic failure, it should always rollback to a safe point. The overall aim should be for an operator to be able to simply restart whichever procedure was being used at the time of failure. No explicit recovery procedures should need to be undertaken. This goal is essential to avoid having to provide continued low-level support for a system.

Automatic recovery is relatively easy to arrange for transactions that involve the update of a single database file record, as the update will have either succeeded or failed. A more difficult problem is presented when a single logical transaction requires the update of several database file records on one or more files. Briefly, there are several possible approaches:

  1. Design the database update processes so that whether or not the update is deemed to occur depends on a single transaction. For example, add a status flag to a control or header record and update this last. Transactions that have an incorrect status flag are ignored.
  2. For batch procedures only, the entire database could be saved before running the procedure, so that the start position can be restored in the event of a failure. Backup could be online; either to a save file or using the OS/400 Copy File (CPYF) command.
  3. Use the journaling and commit control facilities of OS/400 to synchronize the transactions automatically.