This section describes the known issues and limitations at this time.
This means that an appliance can only talk to appliances connected to it (plus its own server and the grid controller). Nevertheless, protocols on new appliances should be properly specified to help ensure application design integrity and compatibility with future versions of CA AppLogic®.
The total available disk space reported by the grid info command is a raw estimate and does not take volume mirroring into account. The true available disk space is the reported available amount divided by the number of mirrors (2 mirrors by default). For example, if there is 1000GB of available disk space and the grid was configured for mirroring of 2, the available disk space is 500GB. Also, to successfully mirror volumes, there must be enough disk space on at least X servers where X is the number of mirrors (CA AppLogic® will not fail to create a volume if any one of its mirrors cannot be created, it will display a warning that the volume could not be mirrored).
If an application is started and one of the grid's servers fails, the application start will fail if one or more of the application's appliances were scheduled to run on the failed server. If this situation occurs, simply restart the application.
To upload larger files to your volume, use the vol manage shell command; don't forget to specify the external IP settings for this command to enable remote access from within the volume manager. For more information, see the reference for the vol manage command.
CA AppLogic® does support booting OpenSolaris appliances from a zfs-based boot volume. Please note however that this has not been verified by CA and may not work. Solaris 10 does not support zfs.
Currently this is limited to single device zfs pools. To take full advantage of all of the zfs capabilities in CA AppLogic®, users may assemble their own zfs pools inside of their own appliances. If a zfs pool is going to be used for mirroring, the CA AppLogic® volumes that are used in the pool should be created with the CA AppLogic® mirroring disabled (using the mirrored=0 option when creating the volumes). Also, a zfs pool created using the CA AppLogic® Solaris filer will not work in Solaris 10. See RefOsLimitations for all of CA AppLogic® OS limitations.
If you need larger storage, please use a different file system.
The new dhcp configuration mode does not support the property markup for appliance configuration. When porting appliances from volfix to dhcp configuration modes, the APK documentation describes how to deal with appliances that depend upon the property markup for appliance configuration. See the Appliance Kit (APK) for more information.
To see the validation flags for an application, open the application in edit mode. The validation flags are used to flag appliances that do not have all of their mandatory properties/terminals/volumes properly configured.
iso2class may be used to install a Solaris 10 appliance using the graphical console for the installation process. However, after the installation is complete and the appliance is re-started, the graphical console may still be used however it must be used in text mode (no access to the Solaris 10 desktop - strictly text-based access). This is due to a problem in the Solaris 10 GUI (not a CA AppLogic® bug).
Therefore, the graphical console cannot be used with these appliances. This is done on purpose to make the appliances as compact as possible. Using the new iso2class utility, users may create their own appliances with full desktop support.
This error is due to the fact that CA AppLogic® sets the computer name of an appliance to its instance name. Therefore, if you have more than 1 appliance running on a grid that all have the same instance names, the duplicate name error will be displayed in Windows on the graphical console. This error is simply a warning and does not affect the grid or its operation. However, if you need to use Windows as a domain controller, you will need to set the computer names to unique names for each appliance. You may use the wincfg utility to set the computer name in your appliance.
We have tested with Java version 6 update 7 on IE/FF/Chrome/Safari. If the latest version of Java is not used, the graphical console may not work correctly (it will hang while trying to load). Before reporting graphical console errors to CA, be sure to verify that you are using the latest Java version (if you need to upgrade java in your browser, be sure to re-open your browser afterwards for the graphical console to work correctly).
When a secondary server takes over as the new primary server, if there are not enough resources available on the server to start the grid controller, CA AppLogic® restarts appliances which are running on the new primary server on other servers within the grid so the grid controller can be started on the new primary server. Note that this may break appliance failover groups. If CA AppLogic® stops one of these appliances it may not be able to restart the appliance on another server because there may not be enough resources to satisfy the failover group.
All HVM-based appliances (Solaris 10, Windows, etc.) use more memory on the server than what they are configured to use. Typically, depending upon the amount of memory assigned to an HVM-based appliance, the appliance uses additional memory on the server in which it is running (this additional memory is required by the virtualization hypervisor running on the servers and is known as shadow memory). Therefore it is possible that even though a server might have enough available memory as compared to what is assigned for the appliance, the appliance will not be able to run on that server due to the additional shadow memory needed for HVM-based appliances that is not available on the server. The CA AppLogic® scheduler does take this extra shadow memory into account when scheduling appliances during application start.
When using a 10G backbone, the maximum throughput that can be achieved between appliances running on different servers is about 2Gbps (possibly due to some sort of limitation within the hypervisor used by CA AppLogic®).
Any other browser may be used instead.
Shared interfaces should work with all other operating systems.
The following are the known issues in this release:
When using the HP Smart Array RAID controller without the write cache enabled, there is a 50% reduction in performance. This issue has been verified on a HP DL 580 G7 Server, with Smart Array P410i 256mb. These cards require a battery or capacitor to be installed to enable the write cache.
When using ServerEngines Corp. Emulex OneConnect 10Gb NIC (be3) (rev 01) NICs with CA AppLogic®, these NICs incorrectly bounce packets if the SR-IOV BIOS option is enabled. These bounced packets alter the bridge's forwarding cache, causing the bridge to drop packets instead of forwarding them to the correct destination. This causes instability in CA AppLogic® which results in intermittent application start failures. Therefore, please ensure that the SR-IOV BIOS setting is DISABLED for all Emulex 10G NICs on all servers within the grid.
When using a 10G backbone, the maximum throughput that can be achieved between appliances running on different servers is approximately 2.5 Gbps (you may observe different results depending upon the type of 10G hardware that is being used). CA is currently researching several network optimizations (such as enabling jumbo frames) that may be enabled in future CA AppLogic® releases in order to enhance 10G network performance.
Very rarely an application will fail to start due to a stuck volume mount on one of the servers. CA AppLogic® detects stuck volume mounts and reports them to the user on the grid's dashboard. If this problem occurs on your grid, notify CA Support. Optionally, disabling the server or rebooting the server that has the stuck mounts will resolve this issue.
If this situation occurs, rebooting the primary server will restore the grid to an operational state.
VDS: security vulnerability: initial user/password setup
If the CA AppLogic® GUI is accessed using Microsoft Internet Explorer 6 or 7, the GUI leaks memory as applications are opened for editing or when the web shell is opened (5-20MB of system memory are leaked for each of these operations). It is recommended to close and re-open the browser every few hours to recover the leaked memory. Firefox, Chrome or Safari may also be used instead of Internet Explorer.
The GUI no longer automatically logs the user out when there is heavy load on the grid controller. Instead, the user will receive a message stating that there was a network error. In this case however, the GUI is still fully functional. The network error message will only be received when there is heavy load on the controller, such as starting 4 applications at the same time AND copying a large multi-GB volume. In large grids, try assigning up to a full CPU core and 1GB RAM to the controller.
If a grid is rebooted using the grid reboot command, when the grid comes back up after the reboot, one or more of the system volumes may become degraded. CA AppLogic® automatically repairs these volumes as highest priority.
When migrating a volume, verify that at least one of its streams is on an enabled server or else the migration command will fail. The volume can be completely migrated off of its original set of servers by migrating the volume twice.
Some physical servers may take a long time to reboot - this may cause CA AppLogic®'s automated grid recovery to fail. The end result of this is that applications may not be all restarted automatically after the grid recovers from a failure. This is due to the grid controller waiting for a maximum of 10 minutes for all servers to reboot and reconnect to the grid controller (which may not be enough time for all servers to reboot). Workaround is to manually restart applications after all servers have reconnected to the grid controller - execute "list srv" to help ensure that all servers are connected to the grid controller - they all should be in the UP state. In CA AppLogic® 2.1, with server boot timeout of 10 minutes, this may occur primarily if a server fails to boot due to hardware or BIOS malfunction.
When the operator reboots the grid, the grid flapping state is supposed to be reset and a message should be displayed on the dashboard stating that the operator rebooted the grid intentionally ("Grid has been restarted by operator on ..."). Occasionally when rebooting the grid, the grid file is not reset nor is the dashboard message displayed. The only problem that this may cause is upon the next grid failure, the applications may not be automatically restarted (depending on how many times the grid has failed when this bug occurs). To workaround this problem, if after an intentional grid reboot there is no dashboard message displayed, contact CA Support to have the grid flapping state reset on your grid.
The reason for the slightly reduced resources is related to allocation for service areas. For memory, it is likely due to Xen related to the memory map table for a virtual machine. For disk, it is due to normal file system service areas (this is the same as on regular Linux servers).
In this case, the application is not opened for editing by any other user but the CA AppLogic® editor erroneously thinks somebody else has the application open for editing. If this occurs, simply override the application lock when prompted by the editor upon opening the application.
The main slowdown occurs when opening an application in the CA AppLogic® infrastructure editor.
If the client has the graphical console open and they lose connection to the internet (client network card failure, client computer crash, internet access is unavailable, etc.), it will take 15 minutes to re-open the graphical console.
The mouse is hard to use in Ubuntu when using the CA AppLogic® graphical console. This is due to a limitation of the Xen VNC support (mouse acceleration is not supported). Some users report that adjusting the mouse settings in Ubuntu resolves the issue. Also, rarely keystrokes will be repeated several times when typing in text from the keyboard (in such cases, simply delete the extra characters that are displayed).
This includes passwords when logging into an appliance. The text boot console should only be used for debugging purposes. The SSH console can be used instead for all other purposes.
If a user re-opens the text boot console for an appliance after it has already been opened, they must press the enter key to see either the login prompt or the command prompt. This is because the boot console is waiting for user input (either for login information or a command to be executed).
If a grid has an appliance that is part of a failover group running on a secondary server where the grid controller needs to be restarted, CA AppLogic® may stop that appliance which could break the failover group.
After upgrading a grid to the latest release, a dashboard message is posted stating that the grid failed due to a hardware issue. This message can be safely ignored and removed from the dashboard.
The appliance kit (APK) does not currently work with Ubuntu 9.10 or 10.x due to several incompatibilities with the newer OS. However, there are various posts on the CA AppLogic® forums that describe how to use some of the later OS distributions with CA AppLogic®.
If using a network HA configuration with CA AppLogic® and there is an external network failure, applications/appliances that use external interfaces may become inaccessible for up to 5 minutes. This appears to be caused by the external router caching MAC addresses. Waiting for the router to flush its ARP cache or sending an ARP response with arping from the application restores operation. This only affects the external network (the backbone network is not affected).
Solaris 10 does not work on CA AppLogic® 3.x for both Xen and ESX servers.
OpenSolaris only works on Xen-based servers.
The recovery GUI only works on Xen-based servers.
Shared interfaces do not support appliance counters.
If a user power-cycles a grid, the system uptime is not reset. If the grid is rebooted, the system uptime should be reset.
If a user power-cycles a grid using the grid power_cycle command, the primary server may fail to reboot. This only occurs when the command is executed after a new grid install and the grid was never rebooted before the power cycle command was executed. Rebooting the grid at some point after a new grid install will avoid this issue.
If the NFS share size is changed while a grid is running, CA AppLogic® will not detect this until the grid is rebooted. This issue will be resolved in a future release.
When a grid that used a SAN is destroyed, CA AppLogic® deletes the contents of the grid’s folder on the SAN, but leaves behind the empty folder. This issue will be resolved in a future release.
Dell-based servers that use the H200 RAID cards cannot be used with CA AppLogic®. This issue will be resolved in a future release.
The workaround for this problem is to enable hardware RAID on the Dell server before using it for grid creation.
RedHat 5.3 based appliances cannot be installed using iso2class. This issue will be resolved in a future release.
Very rarely, an upgrade to 3.5 from either 3.0 or 3.1 may fail. In this particular upgrade failure case, the following messages are present in the grid’s status log accessed using the BFC (click on the status of the grid to open the log).
installing the controller image ioctl: LOOP_SET_FD: Device or resource busy installing new controller FAILED, aborting
If these messages are present in the log, rerun the upgrade again and it should succeed.
Note: This issue is actually a bug in both CA AppLogic® 3.0 and 3.1, and is resolved in CA AppLogic® 3.5.
The rollback command does not work from 3.5 to 3.1 for an ESX-based grid. However, as a workaround, the downgrade command can be used (note that downgrade takes a bit longer than rollback). This issue will be resolved in a future release.
Ext3-snapshot based volumes do not work on ESX-based grids. However these volumes work on Xen-based grids. If you are using an ESX-based grid and you need to use an ext3-snapshot volume, you can add a Xen-based node to your grid and use that node to create/manage your ext3-snapshot volumes (when running the volume commands, disable all of the ESX servers so the CA AppLogic® filer will run on the Xen-based node). This issue will be resolved in a future release.
An attempt to migrate a volume stream on the local SAN might fail on grids that are configured to use an external SAN. Iinstead of migrating the volume stream to the local SAN, CA AppLogic® incorrectly tries to migrate the stream to the external SAN. If you encounter this failure, use the store=local option with the vol migrate command. This issue will be resolved in a future release.
When CA AppLogic® is upgraded from 3.0.30 to 3.5.x, the grid controller intermittently hangs and any 3tshell command executed returns a low memory condition error message.
To work around the issue, reboot the grid controller. This issue will be resolved in a future release.
On some Broadcom nics, and particularly NetXterme II BCM5709/5716, the link speed is reported as 100Mb/s or 10Mb/s by the nic driver. As a result, CA AppLogic® installation fails.
To work around the issue, attempt a reinstall. This issue will be resolved in a future release.
The OpenSSH version installed on the grid controller limits the number of simultaneous multiplexed ssh sessions to 10. As a result, if more than 10 asynchronous requests are executed, they are dropped by the API.
To workaround the issue, issue less than 10 simultaneous asynchronous requests to the API. This issue will be fixed in a future release.
If you rename an Assembly or component interface, the infrastructure editor does not load completely. This issue will be fixed in a future release.
On servers with these nics, after a grid is created, the output of srv info srvX –extended shows the state of the nics as active-down. This has been identified as a hardware specific issue. In order to work around the issue, log into the respective switch, shut the port of the nic on the srvX and enabled it again. The state should now be shown as up. This issue will be resolved in a future release.
It has been observed that on Dell PE R710 servers with Broadcom NetXtreme II 57711 (bnx2x) 10Gbe nics, BFC fails to discover the servers leading to a failed install. This is a hardware specific issue and will be resolved in a future release.
The following are the key known problems with Windows appliances in this release. Also, see the Windows Appliance Installation Reference for additional procedures and notes.
When using the new Windows APK that ships with CA AppLogic® 3.5, Windows 2003 Server 64-bit Data Center edition may intermittently fail to start when used on a Xen-based grid. If this issue is encountered, restarting the appliance may work around the issue. This issue will be resolved in a future release.
The Windows filer can fail a volume resize operation if the source volume contains a corrupt directory entry/file. The main source of this problem comes from the fact that some of the Microsoft software installations purposely contain invalid directory entries (we are not sure why this is; this has been observed when a user installed a version of Microsoft SQL Server in their appliance). Additionally, the source volume can be corrupt due to normal wear and tear. This issue can be worked around by running a file system repair on the volume (vol fsrepair) before resizing the volume.
It has been observed by CA that the NTFS volume resize operation fails about 2 times out of 100. These 2 failures occurred because the Windows filer failed to start correctly on the grid. If this issue is observed, repeating the resize operation a second time should succeed. This issue however should be resolved in this release; if this issue is observed, notify CA technical support.
The Windows filer uses a Microsoft utility named diskpart to deal with the Windows NTFS volumes. Occasionally diskpart fails to obtain volume information or may fail to mount the volume. This is a very rare failure and may cause either vol create or vol resize to fail over NTFS volumes.
If the user has an application that contains a Windows appliance and one or more Windows appliances are added to the app or terminals are added/removed from the Windows appliances, during the first app start some of the Windows appliances may detect duplicate IPs on their internal network (this can only happen during the first app start after the application is modified). This should not cause any operational failure of the application or require user intervention; the duplicate IP addresses are purely temporary. Worse case, some of the network communication involving any of the Windows appliances may be delayed for up to 30-60 seconds.
An attempt to stop a Windows application hung at 99%; the operation timed out after 15 minutes. The application contained 2 instances of a Windows 2003 Server DataCenter Edition appliance (WIN03DC). One of the Windows appliances stopped and the other one hung during "comp stop". This was only observed one time and could not be reproduced.
Occasionally zeros are reported for the following disk I/O counters for Windows appliances (even though sustained I/O is being generated): Total bytes written/read, # of volume writes/reads, time spent in writes/reads. This is due to a bug in the Windows perfmon API - the zero values is what is being reported by the Windows perfmon API.
Other than the filer MSI, localized Japanese Windows should work under CA AppLogic®.
A windows appliance fails to start if the MagicISO virtual DVD-ROM device is installed. Virtual DVD-ROM devices are not currently supported in CA AppLogic® for windows-based appliances.
Occasionally it takes several minutes for Windows to detect new NICs inside of an appliance. This occurs when the user adds/removes terminals for a Windows appliance singleton. The extra time it takes to detect these new NICs may cause appliance boot timeouts. To workaround this, increase the boot timeout of your Windows appliance.
If a user has a Windows appliance on their grid and they migrate the appliance to another grid that has different hardware, the Windows appliance may require re-activation (Microsoft's Windows re-activation). The re-activation is triggered when a specific amount of hardware has changed (it is unknown to CA exactly what hardware changes trigger the re-activation). Note that re-activation may require access to the internet from within the Windows appliance. This particular problem was observed after resizing the Windows appliance boot volume and migrating the appliance to a different grid.
This issue only affects Windows 2008 Server 32/64-bit (Windows 2003 server works OK). When accessing a Windows 2008 volume either through the filer using ssh to an appliance, the user may not be able to access/modify files due to permission issues. To access/modify files using the command shell, log in through the graphical console to the Windows desktop and open up a command shell. The command shell can be used to access/modify files.
Windows 2003 Server times out during its first boot during installation. Be sure to follow the windows build instructions to workaround this issue.
When installing the Turbogate PV drivers, upon first start of the appliance when running on a Xen-based grid server, the user has to manually click through the hardware setup wizard for the installation of the Turbogate PV drivers for all terminals configured in the appliance. Otherwise, the appliance will fail to start.
When creating a new 32/64-bit Windows 2003 server appliance, the appliance will only work on a grid server that uses the same hypervisor on which the appliance was initially created. Otherwise, the appliance crashes during boot. For example, if the appliance is initially created on an ESX-based grid server, the appliance can only be used on an ESX-based grid server (attempting to use the appliance on a Xen-based grid server will not work, the appliance will crash during boot).
This is a known issue with Microsoft Windows 2003 server. Microsoft has a solution to resolve this issue with your Windows 2003 appliance.
|
Copyright © 2012 CA.
All rights reserved.
|
|