Hello HP community,
I already briefly discussed the issue in VMWare communities, see https://communities.vmware.com/thread/491027
But this was not really leading to a resolution.
I maintain a newly bought HP Proliant DL380e Gen8 server which was freshly installed in August using the HP customized VMWare vSphere 5.5 Update 1 installation ISO. After configuration the server ran fine.
Hardware data:
HP Proliant DL380e Gen8 (bought brand new in August 2014), HP SmartArray B320i storage controller, HP H222 host bus adapter (only a HP Ultrium4 tape drive connected to that), HP Intel 4port NIC 366i, 32GB RAM, 2 Quadcore Intel Xeon E5-2407
I'm aware that the storage controller B320i is not on the VMWare HCL but that's why I used the customnized installation ISO.
After HP released a new SPP and a VMWare 5.5 Update 2 ISO beginning of September I first installed the SPP during maintenance providing several firmware updates. The ilo4 firmware was updated to 2.0 some weeks before.
Afterwards I ran an upgrade installation to VMWare 5.5 U2. All went through without issues, no errors or crashes.
The server was running fine for some days and suddenly the first crash of VMWare happened. The PSOD displayed was similar to the one in the attachment. Error message: PCPU 0: no heartbeat (2/2 IPIs received)
I rebooted the server through iLo console and during the following days the server crashed multiple times with a similar PSOD, always with PCPU 0: no heartbeat (2/2 IPIs received)
At the time of the crash the server/VMWare was mostly idle (at night time or very early in the morning.
I reviewed the BIOS settings and set those according to HP recommondations for VMWare, especially referring to power management settigs.
But no configuration change or setting helped, VMWare kept crashing randomly, sometimes after about half a day, 2-3 days or about a week.
2 days ago I started deploying a new Windows VM, initial VM configuration was successful, the VM was created on the datastore and appeared in inventory. Just at power on of that VM VMWare crashed again with No heartbeat PSOD.
This was reproduceable after a reboot of the system. After the reboot the newly created VM disappeared from the inventory but was still existing physically on the datastore volume.
Since this happened during office hours, I was fed up with testing various BIOS settings and things in VMWare configuration and went back to VMWare 5.5 Update 1 (build 1746018 HP customized) by using the SHIFT+r altbootbank option on boot up.
The server runs stable without issues since then (I know only 2 days, but ...) and new VM deployment works fine with 5.5 U1.
I somehow suspect a kernel <-> driver error here to be the cause of the PSODs. It might be the HP 366i 4port NIC physical driver in conjunction with the virtual E1000 NICs within the VMs or even the HP hpvsa driver for the B320i Smart Array controller.
Anyone around here any ideas?
Thanks in advance for any help provided.
cykVM