I've seen this problem before but no certain resolution.
I have a rack of 5 DL380G7 servers (dual X5680 dual 1k PSU). All 5 servers have one PSU plugged into wall power and 1 PSU plugged into a 5kKVA UPS. Their reasoning is two-fold. On UPS failure, the wall power will endure, on wall failure the UPS will supply power to the servers after the network and clients fail ... far long enough to let drives write their caches and the O/S (Centos) to sync the drives.
Last week we had a systemic power failure in the night that lasted two hours, so all 5 servers shut down. The power came on at 4am and 4 of the 5 rebooted. The 5th had an AMBER power light with the entire health panel and all other indicators dark. Pushing the power button did nothing, pushing and holding for 30 seconds did nothing nor did pushing the button via ILO3. ILO3 showed nothing peculiar except that -at the time of power fail- the power supplies were not redundant.
1) Pulled the power cords, waited 3 minutes & re-plugged and no change.
2) Swapped one PSU each another running server and no change.
3) So we slid the server out of on the rails and opened the top ... and while we were removing the metal cover to the rise in order to access SW6 ... the server came to life.
Yesterday we had another failure (the power company scheduled a down time and forgot to tell us) and the exact same thing happened. This time, being a weekend, I had the office staff do steps 1 & 2 and while I was explaining step 3 (they hadn't even slid the unit out) the server powered up.
Yes, all the BIOS are up to date (and identical).
I've always noticed that DL power supplies are on and quite warm even when the servers are powered down, so I thought perhaps that the PSUs overheat when in that state ... but Step 2 disproved that notion ... and that's after the fact that when building power was out, all 10 PSU's went stone cold so all 10 experenced all the same conditions. Combined with the other similar complaints, this simply REEKS of a 'known issue' with the products, doesn't it?
I've got no problem calling HP Field Support in to fix this, but all my experience tells me that I can't be the first person with this condition .. and I'm not a fan of paying anyone to try things until they finally get it right.
Anyone have any musings or suggestions that don't focus on my bad grammar or shallow reasoning skills?