4.2.5 System Restart and Recovery

4. Operation › 4.2 Operational Guidelines › 4.2.5 System Restart and Recovery

4.2.5 System Restart and Recovery

Restart means that after the problem is fixed, CA MICS
operational processing can be resumed at the job step that
failed. When CA MICS internal step restart is enabled for
the failing job step, restarting the step means that
processing will automatically resume at the last completed
processing phase (at the last "restart checkpoint") within
the database update job step.

Recovery means that after the problem is fixed, the Database
must be restored from a backup and the CA MICS operational
processing must be rerun from the beginning (along with any
other processing executed since the backup was taken).

You can restart a failed operational job through CA MICS
Operational Status and Tracking or you can restart it
manually. For purposes of this discussion, manual restart
includes using your installation's production scheduler
restart procedures.

Note: Operational Status and Tracking does NOT support the
incremental update SPLITSMF or INCRccc jobs. If the
SPLITSMF or an INCRccc job fails, it must be restarted
manually or through your installation's production
scheduler restart procedures.

FAILURE DURING DATABASE AGING

If the update job fails during the Database aging process,
call CA Technical Support for assistance before trying to
restart CA MICS processing. If internal step restart is
enabled for the failing job step, you will normally be able
to restart the Database aging process and complete DAILY
processing. However, due to the critical nature of the
Database aging process, it is still wise to seek guidance
from CA Technical Support.

OPERATIONAL STATUS AND TRACKING RESTART

Enter the RESTART command to restart the failing operational
process at the job step that failed.

o Verify the batch job step (displayed on the RESTART
panel) where processing will be restarted.

o If the "Rebuild DAYSMF step temporary work files" prompt
is displayed on the RESTART panel:

- And if the DAYSMF work files are still cataloged, reply
NO (the default).

- And if the DAYSMF work files have been deleted, reply
YES to rebuild these data sets.
o If the "Restore Accounting and Chargeback audit files"
prompt is displayed on the RESTART panel, review the
online tutorial and respond according to your
requirements. See the CA MICS Accounting and Chargeback
User Guide for more information.

o Edit the generated JCL if required to point CA MICS to an
alternate or backup input data source. Make other JCL
changes as necessary.
o If internal step restart is enabled for the batch job
step where processing will be restarted, then processing
will automatically resume at the last completed
processing phase in this job step.

o If you need to override automatic internal step restart
and force the step to start from the beginning, specify
SYSPARM=NORESTART on the JCL EXEC statement for this
batch job step.

If you did NOT schedule CA MICS processing through
Operational Status and Tracking or using the CA MICS batch
SCHEDULE job, then verify that subsequent scheduled
processing is executed. For example, if you submitted the
DAILY job manually, remember to run BACKUP after DAILY
completes.

MANUAL OR PRODUCTION SCHEDULER RESTART

If you are restarting the DAILY job, and you specified DAYSMF
FILES TEMPORARY in the JCLDEF member of prefix.MICS.PARMS:

o And if the DAYSMF work files have been deleted, submit
the job in prefix.MICS.CNTL(DAYSMFR) and wait for it to
complete.

o Do NOT continue with the restart until DAYSMFR completes
successfully.
If the operational job failed in the DAY199 step (CA MICS
Accounting and Chargeback step):

o Submit the job in prefix.MICS.CNTL(ACTDAY1R). See the
CA MICS Accounting and Chargeback User Guide for more
information on restarting after failures in step DAY199.

o Do NOT continue with the restart until ACTDAY1R completes
successfully.
If the failing job was submitted by Operational Status and
Tracking or by the CA MICS SCHEDULE job:

o Edit prefix.MICS.RESTART.CNTL.

o Enter the RESTART= parameter on the job statement as
noted on the Run Status Report -- for example,
RESTART=(DAY030.MICS).
If internal step restart is enabled for this batch job
step, then processing will automatically resume at the
last completed processing phase within this step.
o If you need to override automatic internal step restart
and force the step to start from the beginning, specify
SYSPARM=NORESTART on the JCL EXEC statement for this
batch job step.

o Edit the DD statements if required to point CA MICS to an
alternate or backup input data source.

o Submit the job stream.
If the failing job was submitted manually from
prefix.MICS.CNTL or by your production scheduler:

o Edit the JCL for the failing job in prefix.MICS.CNTL or
the scheduling facility.

o Enter the RESTART= parameter on the job statement if
required. The correct RESTART= parameter is noted on the
Run Status Report--for example, RESTART=(DAY030.MICS).
If internal step restart is enabled for this batch job
step, then processing will automatically resume at the
last completed processing phase within this step.

o If you need to override automatic internal step restart
and force the step to start from the beginning, specify
SYSPARM=NORESTART on the JCL EXEC statement for this
batch job step.
o Edit the DD statements if required to point CA MICS to an
alternate or backup input data source. Make other JCL
changes as required.

o Submit the job stream.

o CANCEL the edit session so that the RESTART= is not
permanently part of the job.

o Verify that subsequent scheduled processing is executed.
For example, remember to run BACKUP after DAILY
completes.

FAILURE DURING INTERNAL STEP RESTART

If the restarted update job step fails, examine the CA MICS
and SAS logs to determine the cause of the restart failure.

o If the CA MICS log contains,

*** ABORT ERROR ***
PREVIOUS DAYnnn EXECUTION FAILED DURING DATABASE AGING
DATABASE AGING RECOVERY AND RESTART IS NOT POSSIBLE.
PLEASE CONTACT CA TECHNICAL SUPPORT FOR ASSISTANCE.

then the original failure occurred during the database
aging process. You can not restart a job step after
failure in database aging. Call CA Technical Support for
assistance.
o If the CA MICS log contains,

>ERR> Invalid checkpoint. Unable to restart ccc
product DAILY update......

then the internal step restart process determined that
one or more information items critical to restarting the
database update job step are missing. Specify,
SYSPARM=NORESTART
on the JCL EXEC statement to force the job step to
repeat processing from the beginning.
o If the SAS log indicates that the job failed due to a
shortage of disk space on one of the WORKnn data sets
(where nn is 01 - 99) or the cccXWORK data set (where ccc
is the product associated with this database update
step),

- Edit the operational job JCL for the step that failed
and add a PARMOVRD DD stream containing the WORK
and/or RESTARTWORK parameters to temporarily override
the data set allocation parameters for the failing
data sets to increase the space allocation. For
example,

//PARMOVRD DD *
WORK SPACE=(CYL,(50,50)) STORCLAS=MICSTEMP
RESTARTWORK SPACE=(CYL,(50,50))
RESTARTWORK STORCLAS=MICSTEMP
- Restart the database update job step from the
beginning by specifying,
SYSPARM=NORESTART
on the JCL EXEC statement.

- After the job step completes successfully, remove the
PARMOVRD DD stream to resume using the data set
allocation parameters you specified in
prefix.MICS.PARMS(cccOPS). If you believe that the
temporary change to the data set allocation parameters
should be made permanent, then increase the amount of
space requested on the cccOPS WORK (for WORKnn data
sets) or RESTARTWORK (for the cccXWORK data set)
parameter and run cccPGEN.
o If the SAS log indicates that the job failed due to a
shortage of disk space on the cccXCKPT data set (where
ccc is the product associated with this database update
job step), call CA Technical Support for assistance.

RESTART EXAMPLE

The Operational Status and Tracking display (see sample
panels on the next two pages) shows that MONTHLY processing
for the P (PRIMARY) unit failed. The last completed job step
was DAY020.

The Operational Status and Tracking STATUS command shows that
the DAILY job failed in step DAY030 with a U310 abend.

The RESTART command invokes Operational Status and Tracking
restart processing for the P (PRIMARY) unit.
The RESTART Database Update panel shows that processing will
be restarted in step DAY030. DAYSMF temporary work files
will not be recreated as they are still cataloged. Since
Operational Status and Tracking submitted the MONTHLY
processing, MONTHLY and BACKUP will automatically follow
DAILY job restart.

Note: If internal step restart is enabled for this batch job
step, then processing will automatically resume at the
last completed processing phase within this step.
---------------------- Operational Status and Tracking --------- ROW 1 OF 6Command ===> STATUS P Scroll ===> CSR

Commands: Schedule, Daily, Weekly, Monthly, Yearly, Backup, Restore, Restart, Status/History/Checkpt/Joblog, Suspend/Resume, Force Database Current Operation Last Completed Edit Suspend Cmd ID Label Type as of 10 OCT 2001 Job/Step Date JCL Updates -------- -- -------- - ------------------- ------------------ --- --- ________ C CICS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ D DASD U MONTHLY DUE TODAY WEEKLY 09OCT2001 NO NO ________ I IMS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ P PRIMARY P MONTHLY FAILED DAY020 10OCT2001 NO NO ________ R REMOTE U MONTHLY FAILED DAYALL 10OCT2001 NO NO ________ T TEST T DAILY OVERDUE DAILY 19SEP2001 NO NO ****************************** BOTTOM OF DATA ********************************

. . . . . . . . . . . . . . . . . . . . . . . . . .

--------------------------- Unit Database Status -------------------------

Command ===>

Database: P (PRIMARY) - CA MICS PRIMARY DATABASE The status information was recorded at: 10OCT01 08:06 Status of this unit Database: NON-UPDATABLE Status of the cycle aging process: Completed CA MICS Last Completed Job Step & Date Status of Current Operation: MONTHLY -------- --- ------- ----------------------------------------------------- DAILY ALL 10OCT01 FAILED DAY030 U310 MONTHLY 900 03SEP01 HELD Prior job failed BACKUP 900 09OCT01 HELD Prior job failed Status of Other Jobs ----------------------------------------------------- WEEKLY 900 09OCT01 Completed YEARLY 900 10JAN01 Completed RESTORE 900 09OCT01 Completed

. . . . . . . . . . . . . . . . . . . . . . . . . .

----------------------  Operational Status and Tracking  ---------  ROW 1 OF 6Command ===> RESTART P                                        Scroll ===> CSR

Commands: Schedule, Daily, Weekly, Monthly, Yearly, Backup, Restore, Restart, Status/History/Checkpt/Joblog, Suspend/Resume, Force

Database Current Operation Last Completed Edit Suspend Cmd ID Label Type as of 10 OCT 2001 Job/Step Date JCL Updates -------- -- -------- - ------------------- ------------------ --- --- ________ C CICS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ D DASD U MONTHLY DUE TODAY WEEKLY 09OCT2001 NO NO ________ I IMS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ P PRIMARY P MONTHLY FAILED DAY020 10OCT2001 NO NO ________ R REMOTE U MONTHLY FAILED DAYALL 10OCT2001 NO NO ________ T TEST T DAILY OVERDUE DAILY 19SEP2001 NO NO ****************************** BOTTOM OF DATA ********************************

. . . . . . . . . . . . . . . . . . . . . . . . . .

-------------------------- RESTART Database Update --------------------------

Command ===>

Database:  P  (PRIMARY) - CA MICS PRIMARY DATABASE

The status information was recorded at:  10OCT01 08:10
The Database update job that failed:     DAILY
The job will be restarted at step:       DAY030


Edit the job stream before batch submit    ===> YES (YES/NO)
Rebuild DAYSMF step temporary work files   ===> NO  (YES/NO)





Press END (or enter the END command) to generate and submit the RESTART job.
Enter CANCEL to terminate RESTART processing for this unit Database.

. . . . . . . . . . . . . . . . . . . . . . . . . .



DATABASE RECOVERY

If the update job fails due to an I/O error on the CA MICS
Database or due to insufficient DASD space in the Database,
you will need to recover the Database.  Database recovery
involves:

o  Restoring the Database from a backup copy.

o  Resolving CA MICS applications issues, for example
   Accounting and Chargeback file recovery.

o  Rerunning operational processing executed since the
   backup was taken.

Contact CA Technical Support for assistance before recovering
the CA MICS Database.  The remainder of this section provides
basic instructions on the kinds of issues involved.

OPERATIONAL STATUS AND TRACKING RECOVERY

Use the CA MICS Operational Status and Tracking RESTORE
command to restore the Database from a standard or monthly
backup.

o  Review the operational processing log displayed by the
   RESTORE command.

o  Select the standard or monthly backup that meets your
   requirements.  Operational Status and Tracking will
   generate and submit the RESTORE job.
o  Wait for the RESTORE job to complete.

o  Examine the RESTORE job MICSLOG and SAS log outputs.  If
   incremental update is active for one or more products in
   the unit database, then the RESTORE job MICSLOG messages
   may instruct you to run the IUDBINIT job.  IUDBINIT
   re-initializes incremental update database files in order
   to correctly recover the CA MICS database.

o  Identify any CA MICS operational jobs that must be rerun
   to recover data in the CA MICS Database since the BACKUP
   was taken, and run those jobs.
If CA MICS Accounting and Chargeback is installed in the unit
database:

o  Submit the job in prefix.MICS.CNTL(ACTDAY1R).  Edit the
   JCL to restore the ACTAUDIT DAY1 file from the DAY2
   generation that corresponds to the Database backup you
   selected.  DO NOT SAVE THE MODIFIED JCL.

o  See the CA MICS Accounting and Chargeback User Guide for
   more information on restoring a unit database with
   accounting.

o  Do not restart CA MICS operational processing until the
   ACTDAY1R job completes.

MANUAL OR PRODUCTION SCHEDULER RECOVERY

Use the CA MICS Operational Status and Tracking JOBLOG
command to review the standard and monthly backups available
for use in restoring the Database.

Submit the job in prefix.MICS.CNTL(RESTORE).  Edit the job if
you want to restore from a backup other than the most recent
(0) generation standard Database backup.  DO NOT SAVE THE
MODIFIED JCL.

o  To restore from a backup generation other than the 0
   generation, specify the desired backup generation (for
   example, -1) in the cataloged procedure GDG parameter.

o  To restore from a monthly backup, specify the monthly
   backup data set name prefix of the desired backup in the
   cataloged procedure DSNPREF parameter.
o  Wait for the RESTORE job to complete.

o  Examine the RESTORE job MICSLOG and SAS log outputs.  If
   incremental update is active for one or more products in
   the unit database, then the RESTORE job MICSLOG messages
   may instruct you to run the IUDBINIT job.  IUDBINIT
   re-initializes incremental update database files in order
   to correctly recover the CA MICS database.

o  Identify any CA MICS operational jobs that must be rerun
   to recover data in the CA MICS Database since the BACKUP
   was taken and run those jobs.
If CA MICS Accounting and Chargeback is installed in the unit
database:

o  Submit the job in prefix.MICS.CNTL(ACTDAY1R).  Edit the
   JCL to restore the ACTAUDIT DAY1 file from the DAY2
   generation that corresponds to the Database backup you
   selected.  DO NOT SAVE THE MODIFIED JCL.

o  See the Accounting and Chargeback User Guide for
   more information on restoring a unit database with
   accounting.

o  Do not restart CA MICS operational processing until the
   ACTDAY1R job completes.
RECOVERING TABLES AND SCREENS

Call CA Technical Support for assistance.

Use the Operational Status and Tracking JOBLOG command to
review the standard and monthly backups available for the
PRIMARY unit database.  The TABLES and SCREENS data sets are
backed up by the PRIMARY unit.

Determine whether or not changes have been made to the TABLES
and SCREENS data sets since the last backup.  See the Batch
and Operations Analyzer Guide and the Accounting and
Chargeback User Guide for more information on TABLES and
SCREENS data set contents/changes.
Submit the job in prefix.MICS.CNTL(RSTRTBLS).  Edit the JCL
to restore TABLES and SCREENS from the standard or monthly
backup generation that meets your requirements.  DO NOT SAVE
THE MODIFIED JCL.

Repeat all processing and manual data entry that updated the
TABLES or modified the SCREENS data set since the date of the
backup.
RECOVERING ISPTLIB

Call CA Technical Support for assistance.

Do not attempt to restore sharedprefix.MICS.ISPTLIB without
first consulting CA Technical Support.  You may be able to
restore JUST the ISPF tables that are damaged without losing
sharedprefix.MICS.ISPTLIB changes.

Use the Operational Status and Tracking JOBLOG command to
review the standard and monthly backups available for the
PRIMARY unit database.  Sharedprefix.MICS.ISPTLIB is backed
up by the PRIMARY unit.
Determine all changes made to sharedprefix.MICS.ISPTLIB since
the last backup.  This includes CA MICS product, parameter,
and JCL generation jobs; Accounting and Chargeback
parameters; MICF shared inquiries; MICF production reporting
definitions; etc.  Since processing by individual CA MICS
products can make changes to sharedprefix.MICS.ISPTLIB, use
the Subject Cross Reference facility to locate information
that refers to processes that change the contents of
sharedprefix.MICS.ISPTLIB.
Submit the job in prefix.MICS.CNTL(RSTRTLIB).  Edit the JCL
to restore sharedprefix.MICS.ISPTLIB from the standard or
monthly backup generation that meets your requirements.  DO
NOT SAVE THE MODIFIED JCL.

Repeat all processing, data entry, and parameter changes that
modified sharedprefix.MICS.ISPTLIB contents since the date of
the backup.

RECOVERING INCREMENTAL UPDATE FILES

If an INCRccc job or the DAILY job fail due to I/O errors on
an incremental update DETAIL or DAYS timespan file, you will
generally need to rerun the INCRccc or DAILY job with all of
the input data processed so far today.  The incremental
update data sets exist only until the next DAYnnn step
completes execution and are not included in BACKUP
processing.
To recover from a failure due to a damaged incremental update
DETAIL or DAYS data set,

o  Identify the input data that has been processed by
   INCRccc and/or DAYnnn jobs for this product since the
   last successful DAYnnn execution.

o  Delete the incremental update checkpoint and database
   data sets.
o  Use the cccIUALC job to recreate the incremental update
   checkpoint and database data sets.

o  Restart the failing INCRccc job or the failing DAILY job
   DAYnnn step with ALL of the data that has been processed
   for this product since the last successful DAYnnn
   execution.

RECOVERING SPLITSMF JOB OUTPUT FILES

The SPLITSMF job dynamically allocates and populates data
sets with subsets of the input SMF data for processing by the
INCRccc jobs.  If an INCRccc job fails due to a missing
INPUTSMF data set that was originally created by the SPLITSMF
job, then you will need to rerun the SPLITSMF job to recreate
the input data file prior to restarting the INCRccc job.  The
INCRccc job deletes the INPUTSMF data set at successful
termination.

Tell Technical Publications how we can improve this information