Previous Topic: Data Repository Audit ProcessNext Topic: Choose Another Host in a Cluster When Selected Host Fails


Data Repository Heartbeat Monitor Process

The heartbeat monitor process checks whether Data Repository is up and running every 10 seconds. If the heartbeat process fails to confirm that the database is up in 5 minutes, Data Aggregator shuts down. An audit message is logged in the Data Aggregator installation directory/apache-karaf-2.3.0/shutdown.log file.

In a cluster environment, all nodes in the cluster are continuously checked for availability every 10 seconds. If a node cannot be contacted within 5 minutes, an event is generated and logged on the Data Aggregator device in CA Performance Center. An audit message is logged in the Data Aggregator installation directory/apache-karaf-2.3.0/shutdown.log file.

If the Data Repository node that failed is the primary node (through which all Data Aggregator queries were made), Data Aggregator automatically switches to the next available Data Repository node. An event is generated and logged on the Data Aggregator device.

Important! Certain administrative functions that are occurring during a high availability failover are interrupted and then fail. One poll cycle is lost. These functions will not resume after Data Repository connects to another node in the cluster environment. Administrative functions that you perform after Data Repository connects to another node in the cluster environment work as designed.

If all Data Repository nodes fail in a cluster environment, Data Aggregator is shut down.

Loss of contact with Data Repository can result in a loss of data by Data Aggregator. Resolve any connectivity or Data Repository issues before you restart Data Aggregator. Data Aggregator shuts down automatically if it fails to connect to Data Repository on start-up. To minimize data loss, the Data Collector installations continue to collect and store data locally for a time until Data Aggregator is restarted.

To recover a node that has failed, select the “Restart Vertica on Host” option on the main menu of the admintools utility and follow the prompts. Data Aggregator will not establish a heartbeat on the failed node until you restart the Vertica process on that node and there is a successful network connection.