Previous Topic: Network HA + VLAN Tagging GuideNext Topic: Performance Testing - Drive and Network Speed


Network Performance Problem with Adding New Nodes to AppLogic 3.1 Grid with hf2668 (Bandwidth Oversubscription)

Issue:

The customer added 3 new servers (1, 5 and 6) to the AppLogic 3.1 with hf2668 (Bandwidth Oversubscription) grid. Then, he rebooted the production app onto the new servers, migrated the volumes off from srv 2,3 and 4, rebooted them individually to go through the fsck and then rebooted the entire grid.

After the reboot, anything that was running on the new servers (1,5 and 6) is VERY VERY slow.

Troubleshooting:

"3t grid info" showed the problem.

Bandwidth Oversubscription : mixed

[root@c9bfc3 ~]# for i in {1..5} ; do echo ; echo "srv$i" ; ssh 192.168.1.$i '3tsrv info | egrep "Reboot|Bandwidth"' ; echo ; done 
srv1
Reboot Required : no
Bandwidth Oversubscription : no
srv2
Reboot Required : no
Bandwidth Oversubscription : yes
srv3
Reboot Required : no
Bandwidth Oversubscription : yes
srv4
Reboot Required : no
Bandwidth Oversubscription : yes
srv5
Reboot Required : no
Bandwidth Oversubscription : no
[root@c9bfc3 ~]# for i in {1..5} ; do echo ; echo "srv$i" ; ssh 192.168.1.$i 'cat /usr/local/apl-srv/templates/vrmd.conf.tmpl | grep bw_enforce' ; echo ; done 
srv1
bw_enforce_mode = 2
srv2
bw_enforce_mode = 2
srv3
bw_enforce_mode = 2
srv4
bw_enforce_mode = 2
srv5
bw_enforce_mode = 2

From the return output, we see a few things here:

First, the bottom section 1= oversubcribe is on, 2 = configured off. As we see, this does not match what is currently in use for bandwidth oversubscription. So, we have a bug in the hotfix hf2668 because those nodes 2/3/4 should be marked with reboot required in order to apply the new configuration of disabling bandwidth oversubscription. The customer's issue happened because bandwidth oversubscription is OFF on the 1/5/6. As the result, bandwidth restriction happened on his apps and spiked wait times.

Resolution:

The solution for this problem is to apply the bandwidth oversubscription hotfix hf2668 again to reconfigure the nodes. Then, they show "bw_enforce_mode= 1" in "/usr/local/apl-srv/templates/vrmd.conf.tmpl" and need to be rebooted for applying that new config. After reboot, "3tsrv info" should show bandwidth oversubscription is enabled on all nodes.