An explanation of what happens for HA check failed errors we see randomly.
The actual error is being caused by the following line in the 3tha perl script
# acquire lock
$ret = acquire_lock ("check")
if ($ret)
{
my $msg = "HA check failed - Other operation currently in progress.";
#my $t = localtime;
So the function to acquire a lock on the grid is requested and it returns a non-zero return code.
Fundamentally what that function does is very simple
sub acquire_lock ($)
{
my ($op) = @_;
my @lines;
my @ps;
# check if another 3tha is running
@ps = clihlp::run_cmd "pgrep $SCRIPT_NAME";
system "rm -f $LOCK_ROOT/3tha.lock" if (scalar @ps == 2);
# check if the lock file is there
return 1 if (-e "$LOCK_ROOT/3tha.lock");
The first two lines verify if there is indeed another 3tha running (SCRIPT_NAME is set to 3tha at the beginning of the script). It then checks for the return code (the ps==2) that means process not found and, if so, it basically deletes the lock file.
If the lock file has not been deleted (there is a 3tha process running) then it verifies if it has created a lock file, that is a 3tha.lock in LOCK_ROOT, that is /var/spool/applogic/locks/ha If the file is not there, then it goes about creating it.
# create the locck directory if not there
system "mkdir -p $LOCK_ROOT" if (not -e $LOCK_ROOT);
# create lock file
push @lines, $op;
my $ret = write_file ("$LOCK_ROOT/3tha.lock", \@lines);
return 1 if (not $ret);
return 0;
}
That is basically it.
From this it seems pretty clear that what happens is that indeed there is another ha check in progress when the given one is launched, that is, the 3tha script has not really finished from a previous attempt.
If not, as you can see, the lock would have been removed and the command would have gone through
That function is called at the beginning of each ha check.
Basically the 3tha script checks first of all some stuff like if the controller or nodes are flagged for reboot. If not, it goes about checking the apps, ctl and network ha, then it updates the state of the grid and finally it releases the lock.
This or something is running 3tha at a given time of the day when it should not be there.
But as to the randomness.... there are several differing events that can trigger the HA check... that is why it looks random.
|
Copyright © 2012 CA.
All rights reserved.
|
|