Use this best practice to react to unscheduled downtimes on CA7ONL.
A system failure or an unscheduled outage is the dread of all IT personnel. Though many other processes must be taken into account, getting CA WA CA 7 Edition back up and submitting batch work is a priority. If the outage is not DASD-related, consider having a dormant copy of CA7ONL (TYPE=DORM) automatically start on another LPAR in the sysplex. You can also set up CA7ONL and ICOM to be eligible for ARM restarts.
We recommend that you use the CA Datacom/AD Shadow MUF facility so that the database is available when an LPAR fails. Have the Shadow MUF execute on the system where the dormant copy of CA7ONL executes. If IBM ARM is used for restart CA76ONL, have it start CA7ONL on the system where the Shadow MUF executes.
To minimize the failover time, or the amount of time CA WA CA 7 Edition is not present in the sysplex, a dormant copy of CA WA CA 7 Edition is started on the same or a different LPAR in the sysplex. The dormant copy initializes to a point where it looks for the CA WA CA 7 Edition enqueue presence. The enqueue major/minor name is UCC7SVT/CA7n, where n is the instance name. If the enqueue is present in the sysplex, the dormant copy of CA WA CA 7 Edition waits on a timed interval. When it wakes up, it looks for the presence of the enqueue name again. If the enqueue from the primary copy of CA7ONL is not present, the dormant copy issues a WTOR. The dormant copy starts taking over the functions of the normal CA WA CA 7 Edition, such as submitting jobs and processing job completions.
CA WA CA 7 Edition includes an interface to the IBM Automatic Restart Manager (ARM). You can use this interface instead of the dormant copy. You would define an ARM policy for CA WA CA 7 Edition in the sysplex. If CA7ONL or ICOM abend in the system, the active ARM policy indicates to start CA7ONL or ICOM again (either on the same LPAR or on a different LPAR).
Note: The business use for using a dormant copy or ARM is that maintaining an active CA7ONL in the system permits almost continuous operations of job submission and tracking.
In a worst case scenario, the outage is that you do not have queue files. To recover from this type of outage, the log files are key to recovery under most circumstances. If possible, before you restart CA7ONL, run the log dump job against both primary and secondary log files to ensure that the historical data is as up-to-date as possible.
When it is time for processing to resume, start CA7ONL with a TYPE=COLD and the initialization file setting disabling schedule scan and stopping the queues. This start is a COLD type start, which also formats the queue data (all queue data is lost).
With the history file created from the log file dump job, data is available for input to the Recovery Aid program. To run the Recovery Aid program, execute SASSHIS8 with a 50 control card. This program produces reports and a batch file. The reports include the output from an LQ command from the point of failure. The batch file contains DEMAND(H) commands for the jobs that were in the request, ready, and active queues.
After you restart CA7ONL, run a batch terminal interface job with the commands produced by the Recovery Aid program. Jobs that were in the queues at the time of failure are DEMANDed into the queue, ready for processing. Jobs that were in the active queue or ready queue and had already been submitted have DEMAND commands created with a TYPE=RES enabling these jobs to be restarted or force completed.
From this point, the course of action is determined by whether to resume normal processing or only a subset of the workload runs.
To resume normal processing, follow the directions for setting and turning on schedule scan from the previous information about a disaster recovery scheduled outage. You do not have a NEXT SCAN PERIOD START TIME to model the date and time input from, but you can use the time of the outage.
To run only a subset of the normal workload, start the queues, forecast the workload, and then demand the workload into the queues. As an alternative, issue a HOLD,Q=ALL command to place a HOLD on all work in the queues, an SSCAN,SCAN=HLD so that all work that enters the queue after has a hold requirement, set/turn on schedule scan and then release or cancel jobs in the queues as needed.
Business Value:
Following these guidelines ensures that CA7ONL downtime is minimized.
Additional Considerations:
To recap:
More Information:
For more information about the Recovery Aid programs, see the Systems Programming Guide. For more information about the SASSHIS8 program, see the Report Reference Guide.
|
Copyright © 2013 CA Technologies.
All rights reserved.
|
|