Previous Topic: Verify That All Database Tables Are SegmentedNext Topic: Upgrade the Data Aggregator Installation - Command Line


Segment Database Tables (Cluster Installations Only)

During the upgrade process, or any time after an upgrade, if you have not already done so, verify that all database tables are segmented. If unsegmented database table projections are returned, segment them. You can also segment the database tables at any time after an upgrade.

Important! If you do not segment the database tables, you will get a warning message during the Data Aggregator component upgrade.

Segmenting the tables reduces the amount of disk space that is required for the database. Segmenting the tables also improves general query performance. You can segment the database tables either when Data Aggregator and Data Collector are up or when these components are down.

Note: Segmentation is a resource-intensive process. We strongly recommend that you segment the database tables when Data Aggregator and Data Collector are down, before you upgrade the Data Aggregator component. Though you can segment the database tables with Data Aggregator and Data Collector running, we advise against it.

If you segment the database tables when Data Aggregator and Data Collector are down, consider the following information before you upgrade the Data Aggregator component:

Important! When Data Aggregator is not running, the total disk utilization during segmentation must not exceed 90 percent of available disk space. Tables which would cause the disk utilization to exceed 90 percent during segmentation will not be segmented during the process.

Consider the following information if you segment the database tables when Data Aggregator and Data Collector are running, after upgrading the Data Aggregator component:

Important! When segmenting the tables in the database, if Data Aggregator is running, at least 40 percent of the available disk space must remain free for query processing and other database activities.

The disk space for the backup after segmentation completes will increase by the amount of data in the new segmented table projections that were created. Verify that there is enough disk space available after segmentation is completed and before backups run.

The data in the backup area for the old unsegmented table projections will be removed after the time of the restorePointLimit (entry is located in the backup configuration file) plus one day.

To avoid the time that it takes for the old data to be removed, change the snapshot name in the backup configuration file and do a full backup after segmentation is completed. You can then archive the older backup and delete the backup from the backup disk. Use the pre-segmentation backup only if you cannot use the backup that was created after segmentation completed. If you have to use the pre-segmentation backup, you will have to segment the table projections again.

Prepare for Database Table Segmentation

To prepare for database table segmentation, do the following steps:

To back up Data Repository, do the following steps:

  1. Back up Data Repository. Performing a backup is a time-consuming process. Execute the following command:
    backup_script_directory_location/backup_script.sh >/backup_directory_location/backup.log  2>&1
    

    For example:

    /home/vertica/backup_script.sh >/tmp/backup.log  2>&1
    

    Note: For information about how you originally created this script to automatically back up Data Repository, see the CA Performance Management Administrator Guide.

To segment the database tables with no data, do the following steps:

  1. As the Vertica Linux database administrator user, log in to one of the computers in the cluster where Data Repository is installed.
  2. Download the segment.py script from where you extract the installation media. Put the script within a directory that is writeable to the Vertica Linux database administrator user. This procedure assumes that the segment.py script is in the home directory of the Vertica Linux database administrator user.
  3. Type the following command while Data Aggregator is running:
    ./segment.py --task zerotables --pass database_admin_user_password [--name database_name] [--port database_port]
    
    database_admin_user_password

    Indicates the Vertica Linux database administrator user password.

    database_name

    Indicates the name of the database. Optional, if the database name is not the default, drdata.

    database_port

    Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.

    The database tables with no data are segmented.

To determine the amount of time that you need to segment the remaining database tables, calculate a baseline:

  1. To return the table names, sorted largest to smallest, type the following command:
    ./segment.py --task tables --pass database_admin_user_password [--name database_name] [--port database_port]
    
  2. Disable scheduled backups until segmentation is complete. Backups can interfere with the segmentation process.
  3. Select a table from step 1 that is about 5 GB in size. Type the following command to segment the table:
    ./segment.py --task segment --table rate_table_name --pass database_admin_user_password [--name database_name] [--port database_port]
    

    Note: You can run this command when Data Aggregator is running, but we recommend that you run the command during a 2-3 hour maintenance window.

  4. Re-enable scheduled backups.
  5. Use the time it took to segment the 5 GB table to determine how long it may take to segment all of the tables that are less than 100 GB.

    Note: The actual time that it takes to segment the database tables can vary based on the type and compression of the data in the tables. The values that are calculated here are rough estimates. When planning a scheduled maintenance window, add an extra hour of time for every 10 to 15 GB of database tables that will be segmented.

    For large databases, you may not be able to schedule a single maintenance window that is long enough to segment the entire database. In this case, you can segment the database tables over multiple maintenance windows.

Segment Database Tables

Follow these steps:

  1. As the Vertica Linux database administrator user, log in to one of the computers in the cluster where Data Repository is installed.
  2. During the table projection segmentation validation in the previous procedure, if more than ten zero-length table projections were seen during this verification, type the following command to segment them:
    ./segment.py --task segment --pass database_admin_user_password --zerotables [--name database_name] [--port database_port]
    
    database_admin_user_password

    Indicates the Vertica Linux database administrator user password.

    database_name

    Indicates the name of the database. Optional, if the database name is not the default, drdata.

    database_port

    Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.

    For example:

    ./segment.py --task segment --pass password --zerotables --name mydatabase --port 1122
    
  3. If there are table projections that are greater than 100 GB in size, type the following command to create a script to segment the table projections that are less than 100 GB first:
    ./segment.py --task script --pass database_admin_user_password --lt100G [--name database_name] [--port database_port]
    
    database_admin_user_password

    Indicates the Vertica Linux database administrator user password.

    database_name

    Indicates the name of the database. Optional, if the database name is not the default, drdata.

    database_port

    Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.

    For example:

    ./segment.py --task script --pass password --lt100G --name mydatabase --port 1122
    
  4. Disable scheduled backups until segmentation is complete. Backups can interfere with the segmentation process.
  5. To execute the segment-script.sh script, type the following command:
    nohup ./segment-script.sh
    

    The script segments all unsegmented table projections that are less than 100 GB and sorts them from smallest to largest. The output is sent to nohup.out. If the shell is closed accidentally, the script will continue to run.

    Depending on your maintenance window size and the combined size of all of the tables under 100 GB, determine which tables can be segmented in the maintenance window. Modify the generated script by removing the tables that will not fit inside the maintenance window, based on the estimated times that were calculated when you prepared for database table segmentation. Run the generated segment-script.sh during the maintenance window. If all of the tables under 100 GB could not be segmented in the maintenance window, re-generate the script and run the segment-script.sh during the next maintenance window until all of the tables have been segmented.

    Important! When you run the script, any tables that will cause disk utilization to exceed 90 percent will display an error message and will not be segmented. To segment these tables, more available disk space is needed.

    You will be prompted for each table that will cause disk utilization to exceed 60 percent. We strongly recommend that Data Aggregator be brought down before segmenting these tables.

    Note also that this script can take several hours to execute. Do not interrupt the script execution once it begins to avoid corruption of the database.

  6. Re-enable scheduled backups only if more segmentation is needed and will be done in a future maintenance window.
  7. To generate a script, segment-script.sh, that will segment remaining table projections that are over 100 GB, type the following command:
    ./segment.py --task script --pass database_admin_user_password [--name database_name] [--port database_port]
    
    database_admin_user_password

    Indicates the Vertica Linux database administrator user password.

    database_name

    Indicates the name of the database. Optional, if the database name is not the default, drdata.

    database_port

    Indicates the port to use to connect to Vertica. Optional, if the port number is not the default, 5433.

    For example:

    ./segment.py --task script --pass password --name mydatabase --port 1122
    

    Important! When the script is generated, any tables that may cause disk utilization to exceed 60 percent and 90 percent are indicated.

  8. Disable scheduled backups, if they are not already disabled.
  9. To execute the segment-script.sh script, type the following command:
    nohup ./segment-script.sh
    

    The script segments all unsegmented tables and sorts them from smallest to largest.

    Important! When you run the script, any tables that will cause disk utilization to exceed 90 percent will display an error message and will not be segmented. In order to segment these tables, more available disk space is needed.

    You will be prompted for each table that will cause disk utilization to exceed 60 percent. We strongly recommend that Data Aggregator be brought down prior to segmenting these tables.

    This script can take several hours to execute large tables in the database. During an internal segmentation test and a customer database test, segmentation of a table 100 GB or larger took over 10 hours to complete. The segmentation time is not uniform to table size. Time depends on many factors including row count, column count, compression of the data, and machine specifications. Depending on your maintenance window size, plan to segment one table per maintenance window.

  10. To verify that all tables are now segmented, type the following command:
    ./segment.py --task tables --pass database_admin_user_password  [--name database_name] [--port database_port]
    

    The following message appears:

    No tables found with unsegmented projections.
    
  11. Re-enable scheduled backups.
  12. If you segmented the database tables when Data Aggregator and Data Collector were down, start these components:
    1. To start Data Aggregator, type the following command:
      service dadaemon start
      
    2. To start Data Collector, type the following command:
      service dcmd start
      

The previous steps outline the use of the segment.py script, and the various things to consider when you migrate your environment. If you have any questions regarding the use of the script or if you require assistance in planning your migration, contact CA Support.

More information:

How to Upgrade CA Performance Management Data Aggregator - Command Line