During the upgrade process, or any time after an upgrade, if you have not already done so, verify that all database tables are segmented. If unsegmented database table projections are returned, segment them. You can also segment the database tables at any time after an upgrade.
Important! If you do not segment the database tables, you will get a warning message during the Data Aggregator component upgrade.
Segmenting the tables reduces the amount of disk space that is required for the database. Segmenting the tables also improves general query performance. You can segment the database tables either when Data Aggregator and Data Collector are up or when these components are down.
Note: Segmentation is a resource-intensive process. We strongly recommend that you segment the database tables when Data Aggregator and Data Collector are down, before you upgrade the Data Aggregator component. Though you can segment the database tables with Data Aggregator and Data Collector running, we advise against it.
If you segment the database tables when Data Aggregator and Data Collector are down, consider the following information before you upgrade the Data Aggregator component:
Important! When Data Aggregator is not running, the total disk utilization during segmentation must not exceed 90 percent of available disk space. Tables which would cause the disk utilization to exceed 90 percent during segmentation will not be segmented during the process.
Consider the following information if you segment the database tables when Data Aggregator and Data Collector are running, after upgrading the Data Aggregator component:
Note: This list is not exhaustive.
Important! When segmenting the tables in the database, if Data Aggregator is running, at least 40 percent of the available disk space must remain free for query processing and other database activities.
The disk space for the backup after segmentation completes will increase by the amount of data in the new segmented table projections that were created. Verify that there is enough disk space available after segmentation is completed and before backups run.
The data in the backup area for the old unsegmented table projections will be removed after the time of the restorePointLimit (entry is located in the backup configuration file) plus one day.
To avoid the time that it takes for the old data to be removed, change the snapshot name in the backup configuration file and do a full backup after segmentation is completed. You can then archive the older backup and delete the backup from the backup disk. Use the pre-segmentation backup only if you cannot use the backup that was created after segmentation completed. If you have to use the pre-segmentation backup, you will have to segment the table projections again.
Prepare for Database Table Segmentation
To prepare for database table segmentation, do the following steps:
To back up Data Repository, do the following steps:
backup_script_directory_location/backup_script.sh >/backup_directory_location/backup.log 2>&1
For example:
/home/vertica/backup_script.sh >/tmp/backup.log 2>&1
Note: For information about how you originally created this script to automatically back up Data Repository, see the CA Performance Management Administrator Guide.
To segment the database tables with no data, do the following steps:
./segment.py --task zerotables --pass database_admin_user_password [--name database_name] [--port database_port]
Indicates the Vertica Linux database administrator user password.
Indicates the name of the database. Optional, if the database name is not the default, drdata.
Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.
The database tables with no data are segmented.
To determine the amount of time that you need to segment the remaining database tables, calculate a baseline:
./segment.py --task tables --pass database_admin_user_password [--name database_name] [--port database_port]
./segment.py --task segment --table rate_table_name --pass database_admin_user_password [--name database_name] [--port database_port]
Note: You can run this command when Data Aggregator is running, but we recommend that you run the command during a 2-3 hour maintenance window.
Note: The actual time that it takes to segment the database tables can vary based on the type and compression of the data in the tables. The values that are calculated here are rough estimates. When planning a scheduled maintenance window, add an extra hour of time for every 10 to 15 GB of database tables that will be segmented.
For large databases, you may not be able to schedule a single maintenance window that is long enough to segment the entire database. In this case, you can segment the database tables over multiple maintenance windows.
Segment Database Tables
Follow these steps:
./segment.py --task segment --pass database_admin_user_password --zerotables [--name database_name] [--port database_port]
Indicates the Vertica Linux database administrator user password.
Indicates the name of the database. Optional, if the database name is not the default, drdata.
Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.
For example:
./segment.py --task segment --pass password --zerotables --name mydatabase --port 1122
./segment.py --task script --pass database_admin_user_password --lt100G [--name database_name] [--port database_port]
Indicates the Vertica Linux database administrator user password.
Indicates the name of the database. Optional, if the database name is not the default, drdata.
Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.
For example:
./segment.py --task script --pass password --lt100G --name mydatabase --port 1122
nohup ./segment-script.sh
The script segments all unsegmented table projections that are less than 100 GB and sorts them from smallest to largest. The output is sent to nohup.out. If the shell is closed accidentally, the script will continue to run.
Depending on your maintenance window size and the combined size of all of the tables under 100 GB, determine which tables can be segmented in the maintenance window. Modify the generated script by removing the tables that will not fit inside the maintenance window, based on the estimated times that were calculated when you prepared for database table segmentation. Run the generated segment-script.sh during the maintenance window. If all of the tables under 100 GB could not be segmented in the maintenance window, re-generate the script and run the segment-script.sh during the next maintenance window until all of the tables have been segmented.
Important! When you run the script, any tables that will cause disk utilization to exceed 90 percent will display an error message and will not be segmented. To segment these tables, more available disk space is needed.
You will be prompted for each table that will cause disk utilization to exceed 60 percent. We strongly recommend that Data Aggregator be brought down before segmenting these tables.
Note also that this script can take several hours to execute. Do not interrupt the script execution once it begins to avoid corruption of the database.
./segment.py --task script --pass database_admin_user_password [--name database_name] [--port database_port]
Indicates the Vertica Linux database administrator user password.
Indicates the name of the database. Optional, if the database name is not the default, drdata.
Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.
For example:
./segment.py --task script --pass password --name mydatabase --port 1122
Important! When the script is generated, any tables that may cause disk utilization to exceed 60 percent and 90 percent are indicated.
nohup ./segment-script.sh
The script segments all unsegmented tables and sorts them from smallest to largest.
Important! When you run the script, any tables that will cause disk utilization to exceed 90 percent will display an error message and will not be segmented. In order to segment these tables, more available disk space is needed.
You will be prompted for each table that will cause disk utilization to exceed 60 percent. We strongly recommend that Data Aggregator be brought down prior to segmenting these tables.
This script can take several hours to execute large tables in the database. During an internal segmentation test and a customer database test, segmentation of a table 100 GB or larger took over 10 hours to complete. The segmentation time is not uniform to table size. Time depends on many factors including row count, column count, compression of the data, and machine specifications. Depending on your maintenance window size, plan to segment one table per maintenance window.
./segment.py --task tables --pass database_admin_user_password [--name database_name] [--port database_port]
The following message appears:
No tables found with unsegmented projections.
service dadaemon start
service dcmd start
The previous steps outline the use of the segment.py script, and the various things to consider when you migrate your environment. If you have any questions regarding the use of the script or if you require assistance in planning your migration, contact CA Support.
|
Copyright © 2015 CA Technologies.
All rights reserved.
|
|