Hello all!
I have a recurring issue with a Domain Controller that has been a PITA for some time now. The DC is a Server 2012 R2 Standard on an HP ProLiant DL360p Gen 8 hardware platform.
The system will randomly become unreachable by other systems and our Active Directory will begin sending out alarms that Replication has been interrupted and our NOC will get a notification that the system has gone down. When we logon to the DC through the Out-of-bounds network, we can see that the OS is still up, but there is nothing being sent or received through the NIC.
In checking the System Event Logs just prior to rebooting (when the system becomes unreachable), we see about 330 Warning Events with Event ID 16002, Source: AFD
Closing a UDP socket with local port number [55048-54920] in process 1044 is taking longer than expected. The local port number may not be available until the close operation is completed. This happens typically due to misbehaving network drivers. Ensure latest updates are installed for Windows and any third-party networking software including NIC drivers, firewalls, or other security products.
The port range listed in the Event Description isn't even a range that we use. I have a NetStat monitor running and none of those ports show up in the monitor logs before, during or after the event.
Further, I have run DCDIAG and BPA on the system. DCDIAG yielded nothing useful (all Tests passed) nor did BPA (Nothing unusual or that is not on another system that isn't affected by this issue).
The NIC drivers were originally Microsoft and we installed HP drivers on the NIC to see if that would help. It hasn't.
Also, since I don't know the cause of the issue, I cannot manually recreate the error but simply have to wait until it happens again.
Has anyone else seen this issue and if so, how did you remediate it?