Previous Topic: Network Connection Errors Caused by Incorrect Switch SettingNext Topic: Network Performance Problem with Adding New Nodes to AppLogic 3.1 Grid with hf2668 (Bandwidth Oversubscription)


Network HA + VLAN Tagging Guide

This is a quick step by step guide to enabling network HA on a grid with VLAN tagging.

The most important step with setting up network HA while utilizing VLAN tagging, in my opinion, is setting up VLAN tagging correctly first. But in order to set up VLAN tagging correctly you will need a basic working grid.

Follow the steps listed below exactly and you will have a working grid with VLAN tagging and network HA upon installation the first time.

First of all, have a look at the following image and be sure that your infrastructure will support this configuration. In addition to following the cabling image, you must have some knowledge of at least Cisco switch CLI commands, switch configurations, and you must understand what VLAN tagging is and how it works. Do not cable your system, though, until you have read through the steps in this doc.

Configure a basic grid, do not connect any additional network cards or switches at this time and do not uplink the switches together either, even if they are already in the rack and the network cards are already setup do not cable them. On the nodes, use eth0 for the backbone and eth2 for the external network. You can setup you IPMI/power network at this time as well. Just do not plug in the failover hardware (additional backbone/external switches & NICs) at this time.

Go to your BFC and perform a basic grid installation without VLAN tagging. Once your basic normal grid is operational move on to Step 2.

  1. Now go to your BFC and enable VLAN tagging for your grid. You will lose access to your controller over the external network, this doesn't matter though because all of the AppLogic service checks run on the backbone network. You do not have to worry about any random reboots with the external network disabled. For the time being, set the default VLAN to 0 and add in the available application VLAN ranges.
  2. Now that you have lost access to your grid controller its time to log into your switches and fix it. I assume that you have all relevant port information including, BFC port, node port, uplink port(s) and router ports documented. The following ports must be trunked, with a native VLAN enabled on a "port by port" basis. As an example I am going to assume in the following table that your ports are configured in this manner.

The following table assumes that you are using Cisco switches and that you are using Spanning Tree Protocol. Your routers should be setup using VRRP for redundancy.

Switch Port (sw1)

Device (sw1)

SwitchPort (sw2)

Device (sw2)

gigabitEthernet
1/0/1

BFC

gigabitEthernet
1/0/1

N/A

gigabitEthernet
1/0/2

Srv1 (eth1)

gigabitEthernet
1/0/2

Srv1 (eth3)

gigabitEthernet
1/0/3

Srv2 (eth1)

gigabitEthernet
1/0/3

Srv2 (eth3)

gigabitEthernet
1/0/4

Uplink to sw2

gigabitEthernet
1/0/4

Uplink to sw1

gigabitEthernet
1/0/5

Uplink2 to sw2

gigabitEthernet
1/0/5

Uplink2 to sw1

gigabitEthernet
1/0/6

Router1 fe0/0

gigabitEthernet
1/0/6

Router2 fe0/0

gigabitEthernet
1/0/7

Router1 fe0/1

gigabitEthernet
1/0/7

Router2 fe0/1

Global config output 'show running-config' from a correctly configured switch - VLAN ID's are examples
Only relevant information is shown from the config

spanning-tree mode pvst
spanning-tree extend system-id

vlan 1
name Management

vlan 100
name Applogic Default

vlan 101-199

lldp run

interface gigabitEthernet 1/0/1
description BFC
switchport access vlan 100
switchport mode access
spanning-tree portfast
interface gigabitEthernet 1/0/2
description srv1
switchport trunk native vlan 100
switchport trunk allow vlan 100,101-199
switchport mode trunk
switchport nonegotiate
spanning-tree portfast

interface gigabitEthernet 1/0/4

interface gigabitEthernet 1/0/5

interface gigabitEthernet 1/0/6
description RouterExtNetwork
switchport access vlan 100
switchport mode access
spanning-tree portfast

interface gigabitEthernet 1/0/7
description VlanRouting802.1qTrunk
switchport trunk allow vlan 101-199
switchport mode trunk

spanning-tree portfastWith your default VLAN set to 0, your controller will be in the same network range as your nodes. Your nodes CANNOT be in a tagged VLAN because dom0 only supports native ethernet connections. Now 802.1q only tags VLAN ID's 1-4094 respectively, so if you tag a VLAN "0" it simply means "Untagged Frame" and your switch will interpret this and automatically put untagged frames in the native VLAN assigned to the node ports.

What this means is that your controller packets will not be tagged, and your gateway and VDS appliances will not be tagged either. They will all default to the untagged hardware range unless the appliance is manually configured to be in a tagged range.

If you want your controller, gateway and VDS appliances to be, by default, in a different VLAN, then you must specify a different default VLAN at your BFC, one other than 0, but you must also use a different layer 3 IP range. This can cause a problem with controller external access but it is easily fixed right after setting up the default VLAN to be tagged.

Lets say that your hardware range is going to be 172.16.255.0/24 in native VLAN 100, and you want the default VLAN to be VLAN 101 (Tagged). You would assign a network range of 172.16.254.0/24 to your BFC, give your controller an IP address of 172.16.254.254, and make your application range, 172.16.254.2-253, and your gateway 172.16.254.1.

Even though you assigned your controller an IP of 172.16.254.254, your BFC is still going to, once again by default, assign your controller a network and gateway from the hardware range. So if you log into your controller over the backbone, 192.168.1.254 (Since the controller will be unavailable on the external network) and look at the routing table you are going to see something like this:

Ip: 172.16.254.254
Netmask: 255.255.255.0
Network: 172.16.255.0
Broadcast: 172.16.255.255

Well that’s not going to work is it? Because your controllers IP has been assigned to the wrong subnet. To fix this you need to go to your BFC, then goto advanced grid parameters and set these two configuration settings:

ext_network=172.16.254.0/24

ext_gateway=172.16.254.1

Now reboot your controller and it will be accessible again over the external network in its respective VLAN. But it will only be able to access the nodes over the external network if you have the routes configured in your gateway to allow 172.16.254.0/24 to communicate with 172.16.255.0/24.

Now that you have VLAN tagging setup and working you can move on to network HA. Now it is time to move back to the first page and look very closely at the diagram. Be sure that all of your nodes and switches are cabled in the manner shown in the diagram and we will move on to the switch configuration requirements for HA.

I am not going to go over the backbone network, because the backbone network should be using non-blocking switches with STP disabled. It should also have ARP caching disabled. The backbone network needs to be basic enough to fulfill only those needs but it needs to be advanced enough to create an ethernet port aggregate connection for the uplink if you plan on using two cables as the diagram shows.

If your backbone switches are not intelligent enough to have LACP as a feature, then: DO NOT USE TWO CABLES. This will cause an infinite network loop. You should uplink the switches using only one Ethernet cable. Even though the diagram shows the use of two, it is ok to use only one. In addition to the uplink, if your backbone switches are not smart enough to have LACP then they probably do not have LLDP either, it is OKAY, the Applogic network detection scripts will still detect the topology correctly via loop discovery through the uplink. That is all that you need to know on the backbone.

External network HA seems to be the most complex of switch configurations to implement though it is very simple. Here are the requirements.

  1. Both external switch VLAN configurations MUST be identical to each other. Use the example configuration above and follow the VLAN requirements in the beginning of this doc for VLAN port configurations.
  2. LLDP or CDP must be enabled on both external switches. If you are using mixed switch vendors (Manufacturers other than Cisco) you must disable CDP entirely and use only LLDP.
  3. If using Spanning tree protocol, all NODE ports must be in STP portfast mode (including the BFC). This enables the ports to enter an immediate Ethernet non-blocking forwarding state. If this is not enabled the network detection scripts will absolutely fail. Ignore the cisco trunk warning when enabling STP portfast on trunk ports.
  4. If using two uplink cables you must setup an LACP port channel group for the ports being used (Link Aggregation). Otherwise only use 1 cable. It works just as good. DO NOT ENABLE STP PORTFAST ON THE UPLINK PORTS (LACP or etherchannel configuration is beyond the scope of this document)
  5. Your uplink port configuration MUST be in trunk mode with the native Applogic VLAN enabled and the tagged VLANs allowed, just like the node ports. The only difference between the node ports and the uplink ports is that the uplink ports should not have portfast enabled.

If your backbone is configured and your external switch configurations match the above 4 requirements then move on to enabling network HA. This part is very simple:

  1. Connect all cables to all nodes and switches according to the diagram above, paying close attention to your uplink cables based off of uplink requirements.
  2. Login to your grid controller with maintainer access and type: 3t grid set ha_network=1

If you are only planning on setting up backbone HA and you skipped the external HA steps then run:

3t grid set ha_backbone=1

Or if you skipped backbone HA and you only want external HA and you followed all of the external HA requirements correctly run:

3t grid set ha_external=1

The command should complete with no messages. To verify it completed successfully run 3t grid info –verbose you will see all network HA statuses as good in the information dialogue.

To test that it is going to continue to work after a grid reboot, reboot your entire grid either by running 3t grid reboot or by logging into your BFC and issuing a grid stop/start sequence.

If you want to see the boot process in action:

Ping the backbone distribution IP address that your BFC assigned srv1 until srv1 starts pinging again (192.168.0.1 or 192.168.100.1). Next ssh to srv1 over the BFC DHCP address and run tail –f /var/log/messages, watch the logs, you will see the interfaces enter promiscuous mode, you will see them go up and down, and then you will see the bond created and brought up.

If you see errors from network detection, such as: “network detection failed, l1 switch moved to a different LAN” or any other failure message, open a ticket with support so that we can help you get it working.