Previous Topic: How to Download from FTP Server Directly to Grid ControllerNext Topic: How to Measure the Volume Workload from dom0


How to Manually Repair Controller Volume

AppLogic automatically repair controller volume at controller bootup stage. In some scenarios, it fails and need user intervention. This article introduces how to manual repair controller volumes in following versions.

Background

Controller has 3 volumes: boot, meta and impex. each volume has 2 mirrors stored on difference nodes by default

From 3.0, controller has following difference

From 3.5, controller volume can be stored on san/nfs(If san/nfs is enabled when creating grid, controller volume is stored on san/nfs rather than node local hard disk by default, but only has 1 mirror on san/nfs). The controller volume repair process is different in such scenario. You may refer to following document for details.

http://cawiki.ca.com/pages/viewpageattachments.action?pageId=42272906&sortBy=date&highlight=21168769-1+-Controller+Fs%28File+system%29+Has+Broken.docx&

Instructions

The below instruction is mainly for repairing boot and meta volume if their file system are corrupted which cause controller fail to start up.

  1. Login one of physical nodes and execute “3tsrv sd get” , it will displays the server role, controller volume mirrors name, location and state(synced or out of sync).
  2. Save controller volume mirror to other directory before going further. Similar to regular app volume mirrors, they are all located under /var/applogic/volumes/meta and /var/applogic/volumes/vols of host node.
  3. Check the controller volume mirror has been attached to mdX device of primary server or not. If controller volume mirror has NOT been attached to mdX, mount it first, otherwise, go the step 4
    1. Determine whether controller volumes has been attached to mdX
      login primary server, use  “3tsrv sd get” and “3tsrv bd list” to display which hoop, nbd and md device attached with controller volume mirrors. As a cross check, you can check /var/applogic/boot/sys_vols_mounts of primary server. Usually, boot volume is attached to md1, meta is attached to md2, impex is attached to md3, but patern may be different in different Applogic version
      
    2. Choose the qualified volume mirrors to repair
      • The mirror must be on a functional server
      • Volume mirror must has flag “synced=1” in the “3tsrv sd get” output
    3. Mount controller mirror to mdX
      • if you want to manually fsck only one volume mirror:

        Connect to the node where volume is stored, execute "hosetup /dev/hoopY /var/applogic/volumes/vols/<controller volume mirror name>" to attach the volume mirror to availabe hoopY. In the below sample, controller boot volume mirror v-ctl-boot is attached to /dev/hoop100

        hosetup /dev/hoop100 /var/applogic/volumes/vols/v-ctl-boot

        Attach hoopX to an available mdX using "mdadm --assemble /dev/mdX --force --run /dev/hoopY". In the below sample, hoop100 is attached to md110

        mdadm --assemble /dev/md110 --force --run /dev/hoop100

        Execute "3tsrv bd list --all" to verify hoopX and mdX are visible

        If both 2 mirrors have synced=1 flag but you want to repair on mirror only, leave the one you would like to repair with synced=1, and set the other one as synced=0. It can be modified by “3tsrv sd get” followed by “3tsrv sd set”.

      • if you wan to manually fsck both volume mirrors

        Let's assume controller volume mirrors are on server A and B.

        Connect to the node A, execute "hosetup /dev/hoopY /var/applogic/volumes/vols/<controller volume mirror name>" to attach the volume mirror to available hoopY.In the below sample, controller boot volume mirror v-ctl-boot is attached to /dev/hoop100

        hosetup /dev/hoop100 /var/applogic/volumes/vols/v-ctl-boot

        Afterward, exeucte "ndb-server <available port name> /dev/hoopY" to share hoopY. In the below sample, hoop100 is shared with port 1234

        nbd-server 1234 /dev/hoop100

        Repeat the same operation on node B to attach the other volume mirror to hoop device and share it. In addition, connect to nbd device shared by node A using "nbd-client 192.168.<grid id>.<node A id> <ip port of ndb device shared by node A> /dev/nbdZ". In the below sample, the ndb device shared by node A is mapped to ndb150 of node B.

        nbd-client 192.168.<grid id>.<node A id> 1234 /dev/nbd150

        On node B, Attach hoopY and nbdZ to an available mdX using "mdadm --assemble /dev/mdX --force --run /dev/ndbZ /dev/hoopY". In the below sample, hoop100 and nbd150 are attached to md110

        mdadm --assemble /dev/md110 --force --run /dev/hoop100 /dev/nbd150

        Execute "3tsrv bd list --all" to verify ndb and mdX are visible

  4. fsck controller volume

    Controller meta and impex volumes are single volume without partition, but boot volume is the partitioned volume. Therefore, their mounting process are different

    1. In the Boot volume:
      • Execute “file –sL” against mdX to find out start sector number. Its value multiply 512 is the partition offset. In the following sample, start sector is 64, so the offset is 512*64= 32768

        #file -sL /dev/md50

        /dev/md1: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 64, 3888106 sectors, extended partition table (last)\011, code offset 0x48

      • Execute “losetup –f” , it returns the unused loop device
      • Mount mdX to unused loop device with the offset obtained from step a. For instance, loopX is free, and boot volume is attached to mdY, execute “losetup –o 32768 /dev/loopX /dev/mdY”
      • Verify whether loopX is mountable: mount loopX to a directory, if it’s mounted successfully, unmounts it
      • Execute fsck against loopX to repair file system.
      • Execute “losetup –d /dev/loopX” to destroy loopX
    2. In the Meta volume
      • Execute “file –sL” against mdX to verify it’s not partitioned. The output should looks like below

        #file -sL /dev/md51

        /dev/md2: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)

      • Execute “losetup –f” , it returns the unused loop device
      • mount mdX to unused loop device. For instance, loopX is free, and meta volume is attached to mdY, execute “losetup /dev/loopX /dev/mdY”
      • Verify whether loopX is mountable: mount loopX to a directory, if it’s mounted successfully, unmounts it
      • Execute fsck against loop to repair file system.
      • Execute “losetup –d /dev/loopX” to destroy loopX
  5. After repair finishes, execute “3tsrv set role=primary –recover” to recover controller again.
Appendix A: how to repair impex volume in 2.x and 3.x

We usually repair its file system in controller

  1. Execute “mount” to display mount point /vol/_impex and device mount to it, for instance. /dev/hda3 is mounted to /vol/_impex
  2. Unmount /vol/_impex
  3. Execute fsck against /dev/hd3 which is mounted to /vol/_implex
  4. After fsck clean up the file system, execute “mount /dev/hda3 /vol/_impex” to mount impex volume again.
Appendix B: how to repair boot volume in 2.9

In 2.x, boot volume is a no-partioned volume, so it's repair procedure is similar to meta volume in 3.x