Example Scenarios

Database Statistics Reference Guide › Package Usage › Example Scenarios

Example Scenarios

Scenario 1: Fresh Install, Low Data Volumes

In this scenario, it is generally advisable to collect full statistics on a daily basis. All parameters can be left at their defaults except for P_def_cap_samp_pct, which we explicitly set to 100 to ensure that full statistics are gathered on all tables.

BEGIN
	WGN_STATS.GATHER_ALL_STATS(P_def_cap_samp_pct=>100);
END;

Scenario 2: Non-captured Data

The previous example only dealt with gathering statistics on the captured data and audit data tables, as these are the tables that in general are constantly changing and subject to searching and reporting. However many large customers perform a periodic synchronization process with external user and group information. When these synchronizations occur it is advisable to gather statistics on the non-captured tables. This could be combined with the previous example if it occurs at say the same weekly frequency, or if not it could be done separately as follows:

BEGIN
WGN_STATS.GATHER_ALL_STATS(
  P_def_audit_samp_pct=>0,
  P_def_non_cap_samp_pct=>100,
  P_def_cap_samp_pct=>0
  );
END;

Interpreting the above parameters means:

Gather no statistics on the audit tables (P_def_audit_samp_pct=>0)
Gather no statistics on the captured data tables (P_def_cap_samp_pct =>0)
Gather full statistics on other tables (P_def_non_cap_samp_pct =>100)

Scenario 3: High Volumes, Daily and Weekly Statistics

This scenario covers a mature installation that has been capturing data for many months or years. The database is partitioned on a daily basis and most new data is captured in the current daily partition. In this case, we might define two different configurations: one for daily use and one for use at the weekend (when we assume there is more spare resource).

Daily configuration

This gathers statistics at a low percentage for just the current day's data of the main captured data tables; to gather statistics at a higher sample rate for the audit tables; and to exclude all other tables from statistics gathering. So an example usage for daily statistics gathering in this scenario might be:

BEGIN
WGN_STATS.GATHER_ALL_STATS(
	P_def_audit_samp_pct=>50,
	P_def_non_cap_samp_pct=>0,
	P_def_cap_samp_pct=>5,
	P_num_partitions=>1);
END;

Interpreting the above parameters means:

Gather statistics on the audit tables at 50% (P_def_audit_samp_pct=>50)
Gather statistics on the captured data tables at 5% (P_def_cap_samp_pct =>5)
Gather no statistics on other tables (P_def_non_cap_samp_pct =>0)
Gather statistics on the most recent partition only (P_num_partitions=>1)

Weekend configuration

Once a week, when more resource is available on the server, we may want to gather statistics at a higher sample rate on the most recent 7 partitions and, assuming we are using a large multiprocessor server, also exploit higher degrees of parallelism.

BEGIN
WGN_STATS.GATHER_ALL_STATS(
	P_def_audit_samp_pct=>50,
	P_def_non_cap_samp_pct=>0,
	P_def_cap_samp_pct=>25,
	P_num_partitions=>7,
	P_degree=>32);
END;

Interpreting the above parameters means:

Gather statistics on the audit tables at 50% (P_def_audit_samp_pct=>50)
Gather statistics on the captured data tables at 25% (P_def_cap_samp_pct =>25)
Gather no statistics on other tables (P_def_non_cap_samp_pct =>0)
Gather statistics on the most recent partition only (P_num_partitions=>7)
Use a degree of parallelism of 32 (P_degree=>32)

Scheduling

As outlined previously, it is not possible to give completely hard and fast rules on what the exact frequency and sampling percentage should be used to gather statistics, as so much will depend on the volumes of data, the rate at which new data arrives, and the available resources on the database server at various times during the working day/week.

As an initial recommendation we suggest gathering statistics on a daily basis until the data volumes grow to a point where this becomes impractical.

At the point this becomes impractical, we recommend that statistics are still gathered on a daily basis, but with the number of partitions analyzed, and the sampling percentage reduced.

In addition to this reduced daily statistics gathering, we recommend that fuller statistics can be gathered across more partitions at a more convenient interval, such as the weekend. This could also be combined with gathering statistics on user and group data if this corresponds to the synchronization process.

One final consideration is the purge process. In general, the purge process is run out of normal hours, once this is complete, we highly recommend that statistics are collected.