How to Manually Repair Controller Volume

Reference Information › CA AppLogic Support Knowledge Base › Overview of Support Knowledge Base › How to Manually Repair Controller Volume

How to Manually Repair Controller Volume

AppLogic automatically repair controller volume at controller bootup stage. In some scenarios, it fails and need user intervention. This article introduces how to manual repair controller volumes in following versions.

AppLogic grid version 3.0 and 3.1
AppLogic grid version 3.5 (only when controller volume is stored on node local hard disk).

Background

Controller has 3 volumes: boot, meta and impex. each volume has 2 mirrors stored on difference nodes by default

From 3.0, controller has following difference

In 2.x, all controller volumes are single volume without partitioin. from 3.0, boot volume is a paritioned volume, meta and impex are still single volume without partiton.
In 2.x, all controller volumes mirror name are GUID appended to "v-". from 3.0, one of mirror name is changed to v-ctl-boot, v-ctl-meta and v-ctl-impex, the other mirror naming is left unchanged.

From 3.5, controller volume can be stored on san/nfs(If san/nfs is enabled when creating grid, controller volume is stored on san/nfs rather than node local hard disk by default, but only has 1 mirror on san/nfs). The controller volume repair process is different in such scenario. You may refer to following document for details.

http://cawiki.ca.com/pages/viewpageattachments.action?pageId=42272906&sortBy=date&highlight=21168769-1+-Controller+Fs%28File+system%29+Has+Broken.docx&

Instructions

The below instruction is mainly for repairing boot and meta volume if their file system are corrupted which cause controller fail to start up.

Login one of physical nodes and execute “3tsrv sd get” , it will displays the server role, controller volume mirrors name, location and state(synced or out of sync).
Save controller volume mirror to other directory before going further. Similar to regular app volume mirrors, they are all located under /var/applogic/volumes/meta and /var/applogic/volumes/vols of host node.
Check the controller volume mirror has been attached to mdX device of primary server or not. If controller volume mirror has NOT been attached to mdX, mount it first, otherwise, go the step 4
1. Determine whether controller volumes has been attached to mdX
```
login primary server, use  “3tsrv sd get” and “3tsrv bd list” to display which hoop, nbd and md device attached with controller volume mirrors. As a cross check, you can check /var/applogic/boot/sys_vols_mounts of primary server. Usually, boot volume is attached to md1, meta is attached to md2, impex is attached to md3, but patern may be different in different Applogic version
```
2. Choose the qualified volume mirrors to repair
  - The mirror must be on a functional server
  - Volume mirror must has flag “synced=1” in the “3tsrv sd get” output
3. Mount controller mirror to mdX
  - if you want to manually fsck only one volume mirror:
    Connect to the node where volume is stored, execute "hosetup /dev/hoopY /var/applogic/volumes/vols/<controller volume mirror name>" to attach the volume mirror to availabe hoopY. In the below sample, controller boot volume mirror v-ctl-boot is attached to /dev/hoop100
    
    hosetup /dev/hoop100 /var/applogic/volumes/vols/v-ctl-boot
    
    Attach hoopX to an available mdX using "mdadm --assemble /dev/mdX --force --run /dev/hoopY". In the below sample, hoop100 is attached to md110
    
    mdadm --assemble /dev/md110 --force --run /dev/hoop100
    
    Execute "3tsrv bd list --all" to verify hoopX and mdX are visible
    
    If both 2 mirrors have synced=1 flag but you want to repair on mirror only, leave the one you would like to repair with synced=1, and set the other one as synced=0. It can be modified by “3tsrv sd get” followed by “3tsrv sd set”.
  - if you wan to manually fsck both volume mirrors
    Let's assume controller volume mirrors are on server A and B.
    
    Connect to the node A, execute "hosetup /dev/hoopY /var/applogic/volumes/vols/<controller volume mirror name>" to attach the volume mirror to available hoopY.In the below sample, controller boot volume mirror v-ctl-boot is attached to /dev/hoop100
    
    hosetup /dev/hoop100 /var/applogic/volumes/vols/v-ctl-boot
    
    Afterward, exeucte "ndb-server <available port name> /dev/hoopY" to share hoopY. In the below sample, hoop100 is shared with port 1234
    
    nbd-server 1234 /dev/hoop100
    
    Repeat the same operation on node B to attach the other volume mirror to hoop device and share it. In addition, connect to nbd device shared by node A using "nbd-client 192.168.<grid id>.<node A id> <ip port of ndb device shared by node A> /dev/nbdZ". In the below sample, the ndb device shared by node A is mapped to ndb150 of node B.
    
    nbd-client 192.168.<grid id>.<node A id> 1234 /dev/nbd150
    
    On node B, Attach hoopY and nbdZ to an available mdX using "mdadm --assemble /dev/mdX --force --run /dev/ndbZ /dev/hoopY". In the below sample, hoop100 and nbd150 are attached to md110
    
    mdadm --assemble /dev/md110 --force --run /dev/hoop100 /dev/nbd150
    
    Execute "3tsrv bd list --all" to verify ndb and mdX are visible
fsck controller volume
Controller meta and impex volumes are single volume without partition, but boot volume is the partitioned volume. Therefore, their mounting process are different
1. In the Boot volume:
  - Execute “file –sL” against mdX to find out start sector number. Its value multiply 512 is the partition offset. In the following sample, start sector is 64, so the offset is 512*64= 32768
    #file -sL /dev/md50
    
    /dev/md1: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 64, 3888106 sectors, extended partition table (last)\011, code offset 0x48
  - Execute “losetup –f” , it returns the unused loop device
  - Mount mdX to unused loop device with the offset obtained from step a. For instance, loopX is free, and boot volume is attached to mdY, execute “losetup –o 32768 /dev/loopX /dev/mdY”
  - Verify whether loopX is mountable: mount loopX to a directory, if it’s mounted successfully, unmounts it
  - Execute fsck against loopX to repair file system.
  - Execute “losetup –d /dev/loopX” to destroy loopX
2. In the Meta volume
  - Execute “file –sL” against mdX to verify it’s not partitioned. The output should looks like below
    #file -sL /dev/md51
    
    /dev/md2: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)
  - Execute “losetup –f” , it returns the unused loop device
  - mount mdX to unused loop device. For instance, loopX is free, and meta volume is attached to mdY, execute “losetup /dev/loopX /dev/mdY”
  - Verify whether loopX is mountable: mount loopX to a directory, if it’s mounted successfully, unmounts it
  - Execute fsck against loop to repair file system.
  - Execute “losetup –d /dev/loopX” to destroy loopX
After repair finishes, execute “3tsrv set role=primary –recover” to recover controller again.

Appendix A: how to repair impex volume in 2.x and 3.x

We usually repair its file system in controller

Execute “mount” to display mount point /vol/_impex and device mount to it, for instance. /dev/hda3 is mounted to /vol/_impex
Unmount /vol/_impex
Execute fsck against /dev/hd3 which is mounted to /vol/_implex
After fsck clean up the file system, execute “mount /dev/hda3 /vol/_impex” to mount impex volume again.

Appendix B: how to repair boot volume in 2.9

In 2.x, boot volume is a no-partioned volume, so it's repair procedure is similar to meta volume in 3.x