Organizing Input Data for a User-Written Program

Loading a Non-SQL Defined Database › Loading Database Records Using a User-Written Program › Organizing Input Data for a User-Written Program

Organizing Input Data for a User-Written Program

Organize Record Occurrences to Match Schema

To make the database load as efficient as possible, you need to organize the record occurrences to match the structure of the database. For example, you want a CALC owner record to be followed by its VIA member records. The discussion below identifies how to organize the data.

Step 1: Identify the Record Types

The first step in organizing input data is to identify the type of each record. To identify the type of record, add the record's ID to the beginning of each record occurrence. For example, the ID of the DEPARTMENT record is 410; the ID of the EMPLOYEE record is 415.

Step 2: Identify CALC Clusters

A CALC cluster is an occurrence of a CALC record, all of its VIA member records, and all VIA member records of a VIA member record occurrence. For efficient database processing, all the records within a CALC cluster should fit on one page (and thereby, can be processed with one I/O). If the records do not fit on one page, then store the most frequently accessed record types immediately following the CALC record occurrence so that they have a better chance of being stored on the same page as the owner.

Step 3: Form CALC Cluster Hierarchies

A hierarchy is a collection of CALC clusters. For example, if a CALC record occurrence in one cluster is owned by a record in another cluster, you have a hierarchy of CALC clusters. In the Commonweather database, both the OFFICE and DEPARTMENT records own occurrences of the EMPLOYEE record, which in turn owns VIA member record occurrences. In deciding what records to include in the CALC cluster hierarchy, consider the number of CALC record occurrences. For example, if the DEPARTMENT record has many more occurrences then the OFFICE record, then store the EMPLOYEE records immediately after the owning DEPARTMENT record. This potentially saves an I/O because you won't need to reestablish currency on the DEPARTMENT record occurrence later on.

Hierarchies are loaded from top-to-bottom, left-to-right order. When you store the owner of a CALC cluster, you establish currency to store the member of a CALC cluster.

Step 4: Sort the Records in a Hierarchy

To sort records within a hierarchy, add a prefix to the beginning of the record occurrence. The prefix contains the record id and sequence number for each level of the hierarchy. For example, the DEPARTMENT, EMPLOYEE, EMPOSITION record hierarchy might have a prefix that looks like this:

ID and sequence number of each level in hierarchy			Record ID	Record Occurrence

410/1	0/0	0/0	410	Department record 1
410/1	415/1	0/0	415	Employee record 1
410/1	415/1	420/1	420	Emposition record 1
410/1	415/1	420/2	420	Emposition record 2

Step 5: Order the Occurrences of Each Hierarchy

A database page will typically hold more than one database cluster. Therefore, you can load multiple clusters with one I/O if you load all the hierarchies that target to the same database page. To sort the hierarchy occurrences, add the CALC target page number of the top cluster in the hierarchy to the beginning of the input record.

Note: To determine the CALC target page, use IDMSCALC in the program that creates the input file; for more information about IDMSCALC, see the CA IDMS Utilities Guide.

Step 6: Include Records Excluded from the Hierarchies

Some records do not fall within a hierarchy. For example, suppose you did not include the OFFICE record, which owns EMPLOYEE record occurrences in a CALC cluster hierarchy. To load owner records that fall outside of a hierarchy:

Position the non-VIA owner records at the beginning of the input file, before any records that form part of a hierarchy, by adding an identifier to the beginning of each input record. For example, the identifier of the OFFICE record type might be 4 and the identifier of the DEPARTMENT, EMPLOYEE, EMPOSITION hierarchy might be 5.
Add the key of the non-VIA owner record to the end of the hierarchy record occurrence; at load time, use the key to find the owner before storing the member. For example, add the OFFICE-CODE-0450 field to the end of each EMPLOYEE record occurrence.

Step 7: Order Sorted and Indexed Sets

Sorted sets should always be loaded in the same order as the sort sequence. To sort the input data:

For a set within a hierarchy, replace the sequence number field at the record's level in the hierarchy with the sort key of the set; for example, if the EMP-EMPOSITION set is a sorted set, replace the sequence number for occurrences of the EMPOSITION record with the record's sort key in the prefix portion of the input record.
For a set outside of a hierarchy, follow these steps:
1. Re-define the set as manual
2. Create a file containing records with these fields: the owner's page, the set name, the owner's CALC key, the set's sort key, the dbkey of the member record
3. Sort the file in:
  - Descending order by page
  - Ascending order by set name and owner key
  - Either ascending or descending order by sort key
4. After loading the database, connect the set members using a user-written program

Step 8: Sort the Input Records

Sort the input records in:

Ascending order by identifier
Descending order by target page number
Ascending order by the concatenation of all ID and sequence fields that represent a hierarchy

Note: If records are to be stored VIA a system-level index, they should be sorted in the reverse order of their VIA index so records at the end of the index will be processed first by the user-written format program. This ensures that the physical sequence of the records on the database matches the sequence of the index.