

Reference Information › CA AppLogic Support Knowledge Base › Overview of Support Knowledge Base › Major Grid Crash Recovery, Controller Boot Volume Corrupted
Major Grid Crash Recovery, Controller Boot Volume Corrupted
Recovery after a major grid crash controller vm and the primary sever showed filesystem corruption after crash.
Issue: Grid servers crashed including the primary. Several reboots later, the grid was up and running but the primary showed some disk errors in the controller boot volume. server one (primary) also exhibited some disk filesystem errors.
The following procedure was used to correct the disk errors on the primary and to move the controller to a secondary server.
- SSHed into srv1 of the grid
- Executed 'xm list' to see if the grid controller was running
- Executed 'xm console controller' to open the console of the grid controller
- From the console, saw filesystem errors for the grid controller’s boot volume
- Stopped the grid controller by executing '/usr/local/apl-srv/bin/ctlb_ctl2.sh stop'
- Executed 'sdinit cmd=read' to get the location of the streams for the grid controller’s boot volume
- Saw that the one good stream for the grid controller’s boot volume is on srv2
- SSHed into srv2 of the grid
- Executed 'ps aux | grep hoop' to get a list of hoop devices
- Executed 'hosetup' on each hoop device to find out which one is associated to the controller’s boot volume stream
- Assembled an md device with that hoop device using 'mdadm --assemble'
- Executed 'fsck' on the assembled md device
- Answered 'y' to all questions
- Stopped the md device using 'mdadm --stop'
- Rebooted all servers
Copyright © 2013 CA Technologies.
All rights reserved.
 
|
|