Woke up to a stalled server, checked the iLo log and found:
Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible
Uncorrectable PCI Express Error (Slot 2, Bus 0, Device 1, Function 0, Error status 0x00014000)
We're running VMware on the machine... I jumped in vCenter to track down the device and found:
Intel Corporation Sandy Bridge IIO PCI Express Root Port 1a #1
Segment Number: 0
Bus Number: 0
Device Number: 1
Function Number: 0
Capabilities: Bridge Subsystem ID, MSI, PCI Express, Power management
PCI Device ID: 0x3c02
Device ID: PCI 0:0:1:0
Vendor ID: 0x8086
Subsystem ID: 0x0
Subsystem Vendor ID: 0x0
Secondary Bus Number: 7
The only thing we've changed from the stock server is the addition of a FusionIO 320GB PCIe SSD, about a year ago.
Per this thread: http://h30499.www3.hp.com/t5/ProLiant-Servers-ML-DL-SL/DL380p-Gen8-with-uncorrectabl-PCI-express-error/td-p/5995669#.VBrE8vldVRg
I checked our System ROM, and we're at 02/25/2012, four days after the suggested version.
Thoughts?
How can I physically identify "PCIe Root Port 1a #1" to see what's plugged into that might have generated the error?
Thanks!!
Jeff