Previous Topic: 4.2.4 Diagnosing Operational Problems

Next Topic: 4.2.6 Entering Data Into CA MICS

4.2.5 System Restart and Recovery


Restart means that after the problem is fixed, CA MICS
operational processing can be resumed at the job step that
failed.  When CA MICS internal step restart is enabled for
the failing job step, restarting the step means that
processing will automatically resume at the last completed
processing phase (at the last "restart checkpoint") within
the database update job step.

Recovery means that after the problem is fixed, the Database
must be restored from a backup and the CA MICS operational
processing must be rerun from the beginning (along with any
other processing executed since the backup was taken).

You can restart a failed operational job through CA MICS
Operational Status and Tracking or you can restart it
manually.  For purposes of this discussion, manual restart
includes using your installation's production scheduler
restart procedures.

Note:  Operational Status and Tracking does NOT support the
       incremental update SPLITSMF or INCRccc jobs.  If the
       SPLITSMF or an INCRccc job fails, it must be restarted
       manually or through your installation's production
       scheduler restart procedures.

FAILURE DURING DATABASE AGING

If the update job fails during the Database aging process,
call CA Technical Support for assistance before trying to
restart CA MICS processing.  If internal step restart is
enabled for the failing job step, you will normally be able
to restart the Database aging process and complete DAILY
processing.  However, due to the critical nature of the
Database aging process, it is still wise to seek guidance
from CA Technical Support.

OPERATIONAL STATUS AND TRACKING RESTART

Enter the RESTART command to restart the failing operational
process at the job step that failed.

o  Verify the batch job step (displayed on the RESTART
   panel) where processing will be restarted.

o  If the "Rebuild DAYSMF step temporary work files" prompt
   is displayed on the RESTART panel:

   - And if the DAYSMF work files are still cataloged, reply
     NO (the default).

   - And if the DAYSMF work files have been deleted, reply
     YES to rebuild these data sets.
o  If the "Restore Accounting and Chargeback audit files"
   prompt is displayed on the RESTART panel, review the
   online tutorial and respond according to your
   requirements.  See the CA MICS Accounting and Chargeback
   User Guide for more information.

o  Edit the generated JCL if required to point CA MICS to an
   alternate or backup input data source.  Make other JCL
   changes as necessary.
o  If internal step restart is enabled for the batch job
   step where processing will be restarted, then processing
   will automatically resume at the last completed
   processing phase in this job step.

o  If you need to override automatic internal step restart
   and force the step to start from the beginning, specify
   SYSPARM=NORESTART on the JCL EXEC statement for this
   batch job step.

If you did NOT schedule CA MICS processing through
Operational Status and Tracking or using the CA MICS batch
SCHEDULE job, then verify that subsequent scheduled
processing is executed.  For example, if you submitted the
DAILY job manually, remember to run BACKUP after DAILY
completes.

MANUAL OR PRODUCTION SCHEDULER RESTART

If you are restarting the DAILY job, and you specified DAYSMF
FILES TEMPORARY in the JCLDEF member of prefix.MICS.PARMS:

o  And if the DAYSMF work files have been deleted, submit
   the job in prefix.MICS.CNTL(DAYSMFR) and wait for it to
   complete.

o  Do NOT continue with the restart until DAYSMFR completes
   successfully.
If the operational job failed in the DAY199 step (CA MICS
Accounting and Chargeback step):

o  Submit the job in prefix.MICS.CNTL(ACTDAY1R).  See the
   CA MICS Accounting and Chargeback User Guide for more
   information on restarting after failures in step DAY199.

o  Do NOT continue with the restart until ACTDAY1R completes
   successfully.
If the failing job was submitted by Operational Status and
Tracking or by the CA MICS SCHEDULE job:

o  Edit prefix.MICS.RESTART.CNTL.

o  Enter the RESTART= parameter on the job statement as
   noted on the Run Status Report -- for example,
       RESTART=(DAY030.MICS).
   If internal step restart is enabled for this batch job
   step, then processing will automatically resume at the
   last completed processing phase within this step.
o  If you need to override automatic internal step restart
   and force the step to start from the beginning, specify
   SYSPARM=NORESTART on the JCL EXEC statement for this
   batch job step.

o  Edit the DD statements if required to point CA MICS to an
   alternate or backup input data source.

o  Submit the job stream.
If the failing job was submitted manually from
prefix.MICS.CNTL or by your production scheduler:

o  Edit the JCL for the failing job in prefix.MICS.CNTL or
   the scheduling facility.

o  Enter the RESTART= parameter on the job statement if
   required.  The correct RESTART= parameter is noted on the
   Run Status Report--for example, RESTART=(DAY030.MICS).
   If internal step restart is enabled for this batch job
   step, then processing will automatically resume at the
   last completed processing phase within this step.

o  If you need to override automatic internal step restart
   and force the step to start from the beginning, specify
   SYSPARM=NORESTART on the JCL EXEC statement for this
   batch job step.
o  Edit the DD statements if required to point CA MICS to an
   alternate or backup input data source.  Make other JCL
   changes as required.

o  Submit the job stream.

o  CANCEL the edit session so that the RESTART= is not
   permanently part of the job.

o  Verify that subsequent scheduled processing is executed.
   For example, remember to run BACKUP after DAILY
   completes.

FAILURE DURING INTERNAL STEP RESTART

If the restarted update job step fails, examine the CA MICS
and SAS logs to determine the cause of the restart failure.

o  If the CA MICS log contains,

      *** ABORT ERROR ***
      PREVIOUS DAYnnn EXECUTION FAILED DURING DATABASE AGING
      DATABASE AGING RECOVERY AND RESTART IS NOT POSSIBLE.
      PLEASE CONTACT CA TECHNICAL SUPPORT FOR ASSISTANCE.

   then the original failure occurred during the database
   aging process.  You can not restart a job step after
   failure in database aging.  Call CA Technical Support for
   assistance.
o  If the CA MICS log contains,

      >ERR> Invalid checkpoint.  Unable to restart ccc
              product DAILY update......

   then the internal step restart process determined that
   one or more information items critical to restarting the
   database update job step are missing.  Specify,
         SYSPARM=NORESTART
   on the JCL EXEC statement to force the job step to
   repeat processing from the beginning.
 o  If the SAS log indicates that the job failed due to a
    shortage of disk space on one of the WORKnn data sets
    (where nn is 01 - 99) or the cccXWORK data set (where ccc
    is the product associated with this database update
    step),

    -  Edit the operational job JCL for the step that failed
       and add a PARMOVRD DD stream containing the WORK
       and/or RESTARTWORK parameters to temporarily override
       the data set allocation parameters for the failing
       data sets to increase the space allocation.  For
       example,

            //PARMOVRD DD *
             WORK   SPACE=(CYL,(50,50)) STORCLAS=MICSTEMP
             RESTARTWORK SPACE=(CYL,(50,50))
             RESTARTWORK STORCLAS=MICSTEMP
    -  Restart the database update job step from the
       beginning by specifying,
           SYSPARM=NORESTART
       on the JCL EXEC statement.

    -  After the job step completes successfully, remove the
       PARMOVRD DD stream to resume using the data set
       allocation parameters you specified in
       prefix.MICS.PARMS(cccOPS).  If you believe that the
       temporary change to the data set allocation parameters
       should be made permanent, then increase the amount of
       space requested on the cccOPS WORK (for WORKnn data
       sets) or RESTARTWORK (for the cccXWORK data set)
       parameter and run cccPGEN.
 o  If the SAS log indicates that the job failed due to a
    shortage of disk space on the cccXCKPT data set (where
    ccc is the product associated with this database update
    job step), call CA Technical Support for assistance.


RESTART EXAMPLE

The Operational Status and Tracking display (see sample
panels on the next two pages) shows that MONTHLY processing
for the P (PRIMARY) unit failed.  The last completed job step
was DAY020.

The Operational Status and Tracking STATUS command shows that
the DAILY job failed in step DAY030 with a U310 abend.

The RESTART command invokes Operational Status and Tracking
restart processing for the P (PRIMARY) unit.
The RESTART Database Update panel shows that processing will
be restarted in step DAY030.  DAYSMF temporary work files
will not be recreated as they are still cataloged.  Since
Operational Status and Tracking submitted the MONTHLY
processing, MONTHLY and BACKUP will automatically follow
DAILY job restart.

Note:  If internal step restart is enabled for this batch job
       step, then processing will automatically resume at the
       last completed processing phase within this step.
----------------------  Operational Status and Tracking  ---------  ROW 1 OF 6Command ===> STATUS P                                         Scroll ===> CSR

Commands: Schedule, Daily, Weekly, Monthly, Yearly, Backup, Restore, Restart, Status/History/Checkpt/Joblog, Suspend/Resume, Force Database Current Operation Last Completed Edit Suspend Cmd ID Label Type as of 10 OCT 2001 Job/Step Date JCL Updates -------- -- -------- - ------------------- ------------------ --- --- ________ C CICS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ D DASD U MONTHLY DUE TODAY WEEKLY 09OCT2001 NO NO ________ I IMS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ P PRIMARY P MONTHLY FAILED DAY020 10OCT2001 NO NO ________ R REMOTE U MONTHLY FAILED DAYALL 10OCT2001 NO NO ________ T TEST T DAILY OVERDUE DAILY 19SEP2001 NO NO ****************************** BOTTOM OF DATA ********************************



. . . . . . . . . . . . . . . . . . . . . . . . . .



--------------------------- Unit Database Status -------------------------


Command ===>



Database: P (PRIMARY) - CA MICS PRIMARY DATABASE The status information was recorded at: 10OCT01 08:06 Status of this unit Database: NON-UPDATABLE Status of the cycle aging process: Completed CA MICS Last Completed Job Step & Date Status of Current Operation: MONTHLY -------- --- ------- ----------------------------------------------------- DAILY ALL 10OCT01 FAILED DAY030 U310 MONTHLY 900 03SEP01 HELD Prior job failed BACKUP 900 09OCT01 HELD Prior job failed Status of Other Jobs ----------------------------------------------------- WEEKLY 900 09OCT01 Completed YEARLY 900 10JAN01 Completed RESTORE 900 09OCT01 Completed


Database: P (PRIMARY) - CA MICS PRIMARY DATABASE The status information was recorded at: 10OCT01 08:06 Status of this unit Database: NON-UPDATABLE Status of the cycle aging process: Completed CA MICS Last Completed Job Step & Date Status of Current Operation: MONTHLY -------- --- ------- ----------------------------------------------------- DAILY ALL 10OCT01 FAILED DAY030 U310 MONTHLY 900 03SEP01 HELD Prior job failed BACKUP 900 09OCT01 HELD Prior job failed Status of Other Jobs ----------------------------------------------------- WEEKLY 900 09OCT01 Completed YEARLY 900 10JAN01 Completed RESTORE 900 09OCT01 Completed


. . . . . . . . . . . . . . . . . . . . . . . . . .



---------------------- Operational Status and Tracking --------- ROW 1 OF 6Command ===> RESTART P Scroll ===> CSR

Commands: Schedule, Daily, Weekly, Monthly, Yearly, Backup, Restore, Restart, Status/History/Checkpt/Joblog, Suspend/Resume, Force

Database Current Operation Last Completed Edit Suspend Cmd ID Label Type as of 10 OCT 2001 Job/Step Date JCL Updates -------- -- -------- - ------------------- ------------------ --- --- ________ C CICS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ D DASD U MONTHLY DUE TODAY WEEKLY 09OCT2001 NO NO ________ I IMS U MONTHLY Completed MONTHLY 10OCT2001 NO NO ________ P PRIMARY P MONTHLY FAILED DAY020 10OCT2001 NO NO ________ R REMOTE U MONTHLY FAILED DAYALL 10OCT2001 NO NO ________ T TEST T DAILY OVERDUE DAILY 19SEP2001 NO NO ****************************** BOTTOM OF DATA ********************************



. . . . . . . . . . . . . . . . . . . . . . . . . .



-------------------------- RESTART Database Update --------------------------


Command ===>


Database: P (PRIMARY) - CA MICS PRIMARY DATABASE The status information was recorded at: 10OCT01 08:10 The Database update job that failed: DAILY The job will be restarted at step: DAY030 Edit the job stream before batch submit ===> YES (YES/NO) Rebuild DAYSMF step temporary work files ===> NO (YES/NO) Press END (or enter the END command) to generate and submit the RESTART job. Enter CANCEL to terminate RESTART processing for this unit Database.



. . . . . . . . . . . . . . . . . . . . . . . . . .



DATABASE RECOVERY

If the update job fails due to an I/O error on the CA MICS
Database or due to insufficient DASD space in the Database,
you will need to recover the Database.  Database recovery
involves:

o  Restoring the Database from a backup copy.

o  Resolving CA MICS applications issues, for example
   Accounting and Chargeback file recovery.

o  Rerunning operational processing executed since the
   backup was taken.

Contact CA Technical Support for assistance before recovering
the CA MICS Database.  The remainder of this section provides
basic instructions on the kinds of issues involved.

OPERATIONAL STATUS AND TRACKING RECOVERY

Use the CA MICS Operational Status and Tracking RESTORE
command to restore the Database from a standard or monthly
backup.

o  Review the operational processing log displayed by the
   RESTORE command.

o  Select the standard or monthly backup that meets your
   requirements.  Operational Status and Tracking will
   generate and submit the RESTORE job.
o  Wait for the RESTORE job to complete.

o  Examine the RESTORE job MICSLOG and SAS log outputs.  If
   incremental update is active for one or more products in
   the unit database, then the RESTORE job MICSLOG messages
   may instruct you to run the IUDBINIT job.  IUDBINIT
   re-initializes incremental update database files in order
   to correctly recover the CA MICS database.

o  Identify any CA MICS operational jobs that must be rerun
   to recover data in the CA MICS Database since the BACKUP
   was taken, and run those jobs.
If CA MICS Accounting and Chargeback is installed in the unit
database:

o  Submit the job in prefix.MICS.CNTL(ACTDAY1R).  Edit the
   JCL to restore the ACTAUDIT DAY1 file from the DAY2
   generation that corresponds to the Database backup you
   selected.  DO NOT SAVE THE MODIFIED JCL.

o  See the CA MICS Accounting and Chargeback User Guide for
   more information on restoring a unit database with
   accounting.

o  Do not restart CA MICS operational processing until the
   ACTDAY1R job completes.

MANUAL OR PRODUCTION SCHEDULER RECOVERY

Use the CA MICS Operational Status and Tracking JOBLOG
command to review the standard and monthly backups available
for use in restoring the Database.

Submit the job in prefix.MICS.CNTL(RESTORE).  Edit the job if
you want to restore from a backup other than the most recent
(0) generation standard Database backup.  DO NOT SAVE THE
MODIFIED JCL.

o  To restore from a backup generation other than the 0
   generation, specify the desired backup generation (for
   example, -1) in the cataloged procedure GDG parameter.

o  To restore from a monthly backup, specify the monthly
   backup data set name prefix of the desired backup in the
   cataloged procedure DSNPREF parameter.
o  Wait for the RESTORE job to complete.

o  Examine the RESTORE job MICSLOG and SAS log outputs.  If
   incremental update is active for one or more products in
   the unit database, then the RESTORE job MICSLOG messages
   may instruct you to run the IUDBINIT job.  IUDBINIT
   re-initializes incremental update database files in order
   to correctly recover the CA MICS database.

o  Identify any CA MICS operational jobs that must be rerun
   to recover data in the CA MICS Database since the BACKUP
   was taken and run those jobs.
If CA MICS Accounting and Chargeback is installed in the unit
database:

o  Submit the job in prefix.MICS.CNTL(ACTDAY1R).  Edit the
   JCL to restore the ACTAUDIT DAY1 file from the DAY2
   generation that corresponds to the Database backup you
   selected.  DO NOT SAVE THE MODIFIED JCL.

o  See the Accounting and Chargeback User Guide for
   more information on restoring a unit database with
   accounting.

o  Do not restart CA MICS operational processing until the
   ACTDAY1R job completes.
RECOVERING TABLES AND SCREENS

Call CA Technical Support for assistance.

Use the Operational Status and Tracking JOBLOG command to
review the standard and monthly backups available for the
PRIMARY unit database.  The TABLES and SCREENS data sets are
backed up by the PRIMARY unit.

Determine whether or not changes have been made to the TABLES
and SCREENS data sets since the last backup.  See the Batch
and Operations Analyzer Guide and the Accounting and
Chargeback User Guide for more information on TABLES and
SCREENS data set contents/changes.
Submit the job in prefix.MICS.CNTL(RSTRTBLS).  Edit the JCL
to restore TABLES and SCREENS from the standard or monthly
backup generation that meets your requirements.  DO NOT SAVE
THE MODIFIED JCL.

Repeat all processing and manual data entry that updated the
TABLES or modified the SCREENS data set since the date of the
backup.
RECOVERING ISPTLIB

Call CA Technical Support for assistance.

Do not attempt to restore sharedprefix.MICS.ISPTLIB without
first consulting CA Technical Support.  You may be able to
restore JUST the ISPF tables that are damaged without losing
sharedprefix.MICS.ISPTLIB changes.

Use the Operational Status and Tracking JOBLOG command to
review the standard and monthly backups available for the
PRIMARY unit database.  Sharedprefix.MICS.ISPTLIB is backed
up by the PRIMARY unit.
Determine all changes made to sharedprefix.MICS.ISPTLIB since
the last backup.  This includes CA MICS product, parameter,
and JCL generation jobs; Accounting and Chargeback
parameters; MICF shared inquiries; MICF production reporting
definitions; etc.  Since processing by individual CA MICS
products can make changes to sharedprefix.MICS.ISPTLIB, use
the Subject Cross Reference facility to locate information
that refers to processes that change the contents of
sharedprefix.MICS.ISPTLIB.
Submit the job in prefix.MICS.CNTL(RSTRTLIB).  Edit the JCL
to restore sharedprefix.MICS.ISPTLIB from the standard or
monthly backup generation that meets your requirements.  DO
NOT SAVE THE MODIFIED JCL.

Repeat all processing, data entry, and parameter changes that
modified sharedprefix.MICS.ISPTLIB contents since the date of
the backup.

RECOVERING INCREMENTAL UPDATE FILES

If an INCRccc job or the DAILY job fail due to I/O errors on
an incremental update DETAIL or DAYS timespan file, you will
generally need to rerun the INCRccc or DAILY job with all of
the input data processed so far today.  The incremental
update data sets exist only until the next DAYnnn step
completes execution and are not included in BACKUP
processing.
To recover from a failure due to a damaged incremental update
DETAIL or DAYS data set,

o  Identify the input data that has been processed by
   INCRccc and/or DAYnnn jobs for this product since the
   last successful DAYnnn execution.

o  Delete the incremental update checkpoint and database
   data sets.
o  Use the cccIUALC job to recreate the incremental update
   checkpoint and database data sets.

o  Restart the failing INCRccc job or the failing DAILY job
   DAYnnn step with ALL of the data that has been processed
   for this product since the last successful DAYnnn
   execution.

RECOVERING SPLITSMF JOB OUTPUT FILES

The SPLITSMF job dynamically allocates and populates data
sets with subsets of the input SMF data for processing by the
INCRccc jobs.  If an INCRccc job fails due to a missing
INPUTSMF data set that was originally created by the SPLITSMF
job, then you will need to rerun the SPLITSMF job to recreate
the input data file prior to restarting the INCRccc job.  The
INCRccc job deletes the INPUTSMF data set at successful
termination.