Applies to CA6000 and CA6300 appliances
Problem: On restart of the appliance, SCSI "sense" or "hang" errors similar to the following are displayed:
Feb 3 04:59:20 smkong3 kernel: sd 1:0:0:0: SCSI error: return code = 0x08000002 Feb 3 04:59:20 smkong3 kernel: sdb: Current: sense key: Hardware Error Feb 3 04:59:20 smkong3 kernel: Add. Sense: Internal target failure Feb 3 04:59:20 smkong3 kernel: Feb 3 04:59:20 smkong3 kernel: Info fld=0x0 Feb 3 04:59:20 smkong3 kernel: end_request: I/O error, dev sdb, sector 34 Feb 3 04:59:20 smkong3 kernel: Device sdb1, XFS metadata write error block 0x0 in sdb1 Feb 3 05:00:21 smkong3 kernel: aacraid: Host adapter abort request (1,0,0,0) Feb 3 05:00:21 smkong3 last message repeated 255 times Feb 3 05:00:21 smkong3 kernel: aacraid: Host adapter reset request. SCSI hang ? Feb 3 05:01:21 smkong3 kernel: aacraid: SCSI bus appears hung Feb 3 05:01:41 smkong3 kernel: aacraid: Host adapter abort request (1,0,0,0)
These errors indicate logical or physical data corruption has occurred that has not been detected and corrected by the RAID controllers.
Resolution: SCSI errors typically indicate that bad sectors have accumulated on physical drives and caused data corruption on the System or Data Array. Even if the array status is Optimal, attempt to repair the array to repair or avoid use of the bad sectors.
When starting the appliance, use the terminal display to identify SCSI “sense” or “hang” errors. These errors may occur intermittently without apparent harm during normal operation of the appliance or cause intermittent Linux kernel panics. However, if Linux system files have been corrupted, the appliance may be unable to start, repeatedly hitting the same kernel panic.
Recovery can be attempted in two stages:
As part of the recovery process, be sure to use the Disk Utilities to verify disk media for every drive in the array, and then rebuild the array. Note that it can take over an hour to verify each drive.
If both procedures do not eliminate SCSI errors or hangs, it may not be possible to recover the appliance to an operational condition without reinstalling CentOS Linux and the CA Multi-Port Monitor software, or replacing the appliance. Contact CA Support for assistance.
|
Copyright © 2015 CA Technologies.
All rights reserved.
|
|