The following occurred as a result of a power outage.
The AppLogic environment was setup in a network HA configuration and was operating normally.
After a power outage, the system restarted without any apparent problems. However, one server was flagged as being unable to obtain the IPMI status.
A server reboot did not clear this situation.
Rebuilding the server was thought to be an appropriate remedy since a re-discovery should rectify the IPMI status.
Indeed, the re-discovery did clear the IPMI error state.
Unfortunately, re-adding the server to the grid produced a different error.
After going through all of the OS install and file system creation, the add failed at the point of the network discovery.
These messages were produced:
2013-01-29 19:41:58 : Grid Grid8x (7872) is now running short of its optimal configuration.
2013-01-29 19:41:58 : Grid Grid8x (7872) - State information is now: Running, but needs attention.: The grid is running with fewer servers than target
2013-01-29 19:41:58 : Xen Server (9055) is now failed but remains partially running.
2013-01-29 19:41:58 : Xen Server (9055) is now determining which resource failed.
2013-01-29 19:41:45 : Grid Grid8x (7872) - State information is now: Adding Servers to Grid: Error running aldo command: ['addsrv', 'grid=Grid8x', 'servers=192.168.100.196:192.168.100.202:PowerAdmin__BFC:******', 'answer=yes'] -- returned connecting to the controller
using cached controller host address: 192.168.2.254
connected to 192.168.2.254
connected to controller for 'Grid8x' (192.168.2.254), id=2, srv1
connecting to the controller OK
getting existing network config
reading server list
connected to 192.168.2.1
reading configuration from 192.168.2.1
getting existing network config OK
testing the target servers
connected to 192.168.100.196
verifying connection to 192.168.100.196
testing OS and distro version on 192.168.100.196
checking network setup on 192.168.100.196
server check phase 1 completed
testing the target servers OK
detecting network layout
detecting network layout OK
Switches:
N Identifier name model
--------------------------------------------------
a ff:00:00:00:00:02 unknown
b ff:00:00:00:00:01 unknown
LANs:
N Role Switches
----------------------
l1 backbone b
l2 external a
Connections
| l1 | l2 |
| b | a |
192.168.100.196 | eth0* | eth2* |
error: 192.168.2.1: eth0 and eth1 are connected to a LAN with no identifiable switches
error: 192.168.2.1: eth2 and eth3 are connected to a LAN with no identifiable switches
error: 192.168.100.196: eth0 and eth1 are connected to a LAN with no identifiable switches
error: 192.168.100.196: eth2 and eth3 are connected to a LAN with no identifiable switches
cleanup
closing connection to 192.168.2.254
closing connection to 192.168.2.1
closing connection to 192.168.100.196
cleanup OK
aborting due to network connection errors
**** aldo addsrv: command FAILED
The server was checked for any changes (BIOS settings, etc.) and was found to be consistent with other, normally operating servers.
The only other suspect in the network was now the switches.
Examining the configuration, it was found that LLDP (Link Layer Discovery Protocol) had been turned off, apparently as an effect of the power outage.
The option was restored on all of the switches. The subsequent server addition now ran successfully.
|
Copyright © 2013 CA Technologies.
All rights reserved.
|
|