Previous Topic: DR Site Failover and FailbackNext Topic: Configuration after DR site is active


Failover to the DR Site

At the time of a primary site failure, the data tier will possess all of the data necessary to allow the DR site to take over operation. The failover procedure is as follows:

  1. An outage is detected at the primary site.
  2. Incoming customer traffic is stopped from entering the primary site.
  3. Operations in progress at the primary site are allowed to complete or time out during a period of at least 3 minutes after incoming traffic has stopped.
  4. Database replication between sites, if still operational, is allowed to complete.
  5. Database replication between sites is stopped.

    If you are using Oracle, any stale sessions in the primary database will generate an error during failover through Oracle. To resolve this error, issue the following command to kill all open sessions before preparing for failover.

    ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN
    

    If you are using Postgres, any stale sessions in the primary database will generate an error during failover. To resolve this error, issue the following command to kill all open sessions before preparing for failover.

    pg_ctl stop -m fast
    
  6. On all operational servers at the Primary site, run the following commands to convert them into Standby mode as follows:
    cd /opt/CA/saas/repo/application
    DR_mode.sh mode=standby
    
  7. Convert the standby database to a master. Follow the standard Oracle or Postgres failover procedures depending on what database is set up in your environment.
  8. Follow the instructions for Derby Database Synchronization.
  9. On all of the DR site servers, run the following commands to convert them to Live mode:
    cd /opt/CA/saas/repo/application
    DR_mode.sh mode=live
    

    Use the following order for running the script on DR site servers:

  10. Customer traffic is redirected to the DR site by means of DNS changes, or other methods, that remap the tenant-associated hostnames to the DR site.
  11. The DR site is now active.