Previous Topic: Troubleshooting GuidelinesNext Topic: Troubleshooting Tips


Troubleshooting on Graphic Console Login Failure

There is a bug in the way the graphical consoles are being setup. When the controller moves to another server node, it will leave any running grahic consoles on the old server node inaccessible.

TECHNICAL BACKGROUND

Normally, the graphical console will bind to 192.168.<gid>.<srv>:59xx, however if the controller vm is running on the node, the graphical console binds to 192.168.<gid>.252. Since both IP ranges are accessible over the backend the graphical console works for a vm that is bound to either address. The problem occurs when you move the controller, the 192.168.<gid>.252 is unbound and moved to the new primary node.. this leaves any running vms with a graphical console bound to .252 unaccessible.

DIAGNOSE

To check the IP and ports of the running graphic consoles, ssh to the server node, and execute command 'ps -ef | grep qemu'.

e.g. Below results shows graphic console binds to the 192.168.<gid>.252 address. If the controller is not running on the server node, the graphic console becomes inaccessible.

vm.srv1.Webconnect01_LS_US.main.WIN03y_TG35 -videoram 4 -k en-us -vnc 192.168.3.252:0 

Each time a vnc server is started it starts on port 5900, a second one starts it goes to 5901, etc... With 2 IPs in use, both can bind to port 5900. If another app is started on the primary node after the controller has been moved, the previous inaccessible graphic console might now be mistakenly redirected to the wrong graphic console.

e.g. Issue 21482703 results in a wrong graphic console connection, where another graphic console being bound to the same IP and port.

vm.srv1.win12s_install.main.win12s -videoram 4 -k en-us -vnc 192.168.3.1:0 
tcp 0 0 192.168.3.1:5900 0.0.0.0:* LISTEN
tcp 0 0 192.168.3.252:5900 0.0.0.0:* LISTEN

WORK AROUND

The work around to this is to restart the problematic component.

Another work around is to move the controller back to the old primary node, if the component can't be restarted due to production environment. Also note if there is any grahic consoles being bound to the new primary node.

REFERENCE

SCR 9171 has been opened for this.