Parallel Backup in REORG

REORG (Parallel BACKUP/LOAD) › Parallel Backup in REORG

Parallel Backup in REORG

The parallel backup portion of REORG allows you to use from 1 to 25 subtasks, each backing up a portion of a data area. Each subtask is assigned a key range and runs independently of other subtasks to get its assigned records to its sequential output file. If the data is slightly out of sequence, we recommend using only a few subtasks. In general, the more out of sequence the data, the more subtasks you should use. When the data is extremely out of sequence, if all buffers are not in memory, the backup process is I/O bound for large areas.

The processing flow for parallel backup using the REORG function is as follows:

Phase 1A: The area is opened for backup in DBUTLTY in a process similar to that of the BACKUP function. The processing sequence is always in native sequence, never in physical sequence.
Phase 1B: For each key ID that is a native sequence key, and for any table in the selected area, the low and high key range is selected for each backup subtask. If one task is selected, the key range is set as low values and high values. If more tasks are selected, the high level of the index is scanned to select a best guess at what the key ranges need to be, to approximate each subtask's reading of its portion of the data area. For example, if 4 tasks are requested, each should be assigned a key range to process about one fourth of the data rows.
This process makes decisions based upon the high level of the index, and its effectiveness varies. The index is highly compressed, and the amount of data therefore represented by each entry varies. The goal is to read the highest level single block and use that information for the key selection. If that seems inadequate, the next lower level of the index is scanned. The default, 5 entries per task, is often a reasonable value. You can change this value, however, using the CYCLE= keyword, to cause more or less entries per task as required. This can be thought of as a trade-off between the effort it takes to select the key ranges, and the time when one subtask ends before the final subtask ends. The load cannot start until all backup subtasks are complete. Message DB13259I REORG BACKUP KEYID-n LEVEL-n ENTRIES-n provides details about the index selection, when more than one task is selected.

It is possible and correct for one or more subtasks to be assigned no key values. This can occur for small areas or in the special case where a very large number of duplicate key values exist. For example, it could occur if an area had one million records and every record had a native sequence key of blanks, in which case the range that includes blanks is assigned to one of the tasks, and that task finds all the records. That task may be the first task, the last task, or a task in the middle. Other tasks run but quickly end without finding any rows, which is okay. The condition can also occur if tables in multiple areas share a native sequence key ID. In this case, one area has a very large number of records and another very few that are clustered in a limited key value distribution.
Phase 1C: Each subtask opens a task to the MUF. Each executes a GSETL/GETIT-type request, with blocking, to find and get the data rows within its key range. If the data area and/or index are covered, the backup benefits. The blocking is at 32K, rounded down if the task size is smaller. Sufficient tasks must be available, or the normal return code 85 should be expected. The ACCESS setting for the database is ignored, because the single user process has already opened the database for backup.

At the completion of phases 1A, 1B, and 1C, a full and complete backup has been performed with the output data written to 1-25 sequential data sets. If OPTION2=BACKUPONLY was specified, the utility function is complete, and the data sets are available for use. If OPTION2= was not specified as BACKUPONLY, parallel data load starts. If this load fails, it must be restarted with the sequential files that were just produced, but with OPTION2= changed to LOADONLY. Temporary data sets should not be used unless, to allow for possible errors during the data load or index update phases, another form of backup was taken prior to the start of the REORG.

If the index scanning and key range selection is severely deficient, the only solution is to do them manually using the FIRSTKEY/LASTKEY options of the BACKUP or EXTRACT functions, then input that output to the REORG OPTION2=LOADONLY function.

The 1-25 sequential files produced by this backup process may be concatenated and provided as input to a regular LOAD function, either as all parallel parts concatenated together in key value order for a full load, or as less than all parts to allow for the deleting of a range of rows.

During the backup phase of REORG, you can request the status of the execution by issuing an Operating System modify command to the utility with a STATUS command. This presumes the DBSYSID macro producing a DBSIDPR program had the CONSOLE=YES option selected. If REORG is in the backup phase, this message is generated:

DB01323I - REORG BASE n AREA x RECORD y OF ABOUT z

In the message, the n is the DBID, the x is the area name, the y is the count of data records written to any of the output sequential files by any of the 1-25 sub-tasks running, and the z is the number of records in the area, as stored in the Directory (CXX). The count is usually accurate.