Previous Topic: I/O Requests and ExternalizationNext Topic: Buffer Pool Sizing


Checkpoints and Page Externalization

DB2 checkpoints are controlled using DSNZPARM and CHKFREQ. The CHKFREQ parameter is the number of minutes between checkpoints for a value of 1 to 60 or the number of log records written between DB2 checkpoints for a value of 200 to 16,000,000. The default value is 500,000. You may need different settings for this parameter depending on your workload. For example, you may want it higher during batch processing.

CHKFREQ is a hard parameter to set because it requires a bounce of the DB2 subsystem to take effect. Use the SET LOG CHKTIME command to dynamically set the CHKFREQ parameter.

Use the SET LOG command option to SUSPEND and RESUME logging for a DB2 subsystem. SUSPEND causes a system checkpoint to be taken in a non-data sharing environment. By obtaining the log-write latch, any further log records are prevented from being created and any unwritten log buffers will be written to disk. The BSDS will be updated with the high-written RBA. All further database updates are prevented until update activity is resumed by issuing a SET LOG command to RESUME logging or until a STOP DB2 command is issued. These are single-subsystem only commands, so they will have to be entered for each member when running in a data sharing environment.

During online processing, DB2 should checkpoint about every 5 through 10 minutes or some other value that is based on investigative analysis of the impact on restart time after a failure. Two concerns for how often you take checkpoints are as follows:

The costs and disruption of DB2 checkpoints are often overstated. Although a DB2 checkpoint is a small task, it does not prevent processing from proceeding. Having a CHKFREQ setting that is too high along with large buffer pools and high thresholds such as the defaults can cause enough I/O to make the checkpoint disruptive. In trying to control checkpoints, some users increased the CHKFREQ value and made the checkpoints less frequent but made them much more disruptive. The situation is corrected by reducing the amount that is written and increasing the checkpoint frequency which yields much better performance and availability. For some installations, a checkpoint every minute did not affect performance or availability.

The write efficiency at DB2 checkpoints is the key factor that you have to observe to see if CHKFREQ can be reduced. If the write thresholds (DWQT/VDQWT) are doing their job, less work is performed at each checkpoint. Using the write thresholds to cause I/O to be performed in a level non-disruptive way is helpful for the non-volatile storage in storage controllers.

If you have your write thresholds DWQT and VDQWT set properly as well as your checkpoints, you could still see an unwanted write problem. This problem could occur if you do not have your log data sets properly sized. If the active log data sets are too small, active log switches will occur often. When an active log switch occurs, a checkpoint is taken automatically. Therefore, the logs could be driving excessive checkpoint processing resulting in constant writes. This activity would prevent you from achieving a high ratio of pages that are written per I/O because the deferred write queue would not be allowed to fill as it should.