AIXPowerFail

Strange rc.powerfail message under AIX

The following message sometimes appear on AIX machines. As you will quickly understand, it can be quite worrying for users:

rc.powerfail:2::WARNING!!! The system is now operating with a power problem.
This message will be walled every 12 hours. Remove this crontab entry after the
problem is resolved.

As it is sent through wall, that message is displayed to all users. Panic usually ensues.

Please note that this message does not always mean that the machine is experiencing a problem. It usually means the machine has experienced a problem sometime in the past.

Here are a few steps you can take to eliminate this problem:

1. Check the error logs

Use the following command:

errpt -a | more

Then, search for "POWER" (all caps!) and you will usually see a message such as this one:

=================== MESSAGE =======================

LABEL:          EPOW_SUS_CHRP
IDENTIFIER:     BE0A03E5

Date/Time:       Thu May 10 15:37:46 DFT 2007
Sequence Number: 345
Machine Id:      005FADBA4C00
Node Id:         galactus
Class:           H
Type:            PERM
Resource Name:   sysplanar0
Resource Class:  planar
Resource Type:   sysplanar_rspc
Location:

Description
ENVIRONMENTAL PROBLEM

Probable Causes
Power Turned Off Without a Shutdown
POWER OR FAN COMPONENT

        Recommended Actions
        RUN SYSTEM DIAGNOSTICS.
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
POWER STATUS REGISTER
0000 0002
PROBLEM DATA
[...]

Diagnostic Analysis
Diagnostic Log sequence number: 2247
Resource tested:        sysplanar0
Menu Number:            651205
Description:

The system lost input power.

If the system has battery backup and input power is
not restored, then the system will lose all power in
a few minutes. Take the necessary precautions with
running applications for a system shut down due to
loss of power.

If the system does not have battery backup, then this
message is displayed after power has already been
restored and your system rebooted successfully.

Check the following for the cause of lost input power:
  1. Loose or disconnected power source connections.
  2. Loss of site power.
  3. For multiple enclosure systems, loose or
     disconnected power and/or signal connections
     between enclosures.

Supporting data:
        Ref. Code:      101100AC

=================== MESSAGE =======================

As can be seen above, this message usually means the power has been restored after a failure.

You can check this by comparing the date of the /var/spool/cron/crontabs/root and the date of the last known power problem.

2. Delete the cron configuration

To remove the display of this message through cron, enter the following command as root:

crontab -e

And check for an entry named powerfail, for instance.