Windows Server 2019 suddenly started hard hanging

  • Thread starter Thread starter Darxis
  • Start date Start date
D

Darxis

Guest
Hi,



I have a physical Windows Server 2019 that has been upgraded from Windows Server 2012 R2 a few months ago if I remember correctly. This physical server has been running for more than 5 years without problems.


This server is used primarly as a file server and it runs one Hyper-V machine that is idling 99% of the time, so it uses very low resources, I would say it almost never use more than 20% CPU, 99% of the time it is idling.


2-3 days ago it started hard hanging. The hangs occurs randomly every several hours. When the hang occurs, the screen is black, the mouse/keyboard does not work and I can't ping it, it don't response to ping requests. There is no BSOD. It has to be hard reset.


I don't think it is a hardware problem, because:

  • RAM -> I have run memtest86+ for 2 passes, without errors
  • CPU -> I have run Prime95 for 5 minutes, without errors
  • Power Supply -> I have excluded it because the PC is idling 99% of the time and running Prime95 CPU stressing tool have not triggered the hard hang
  • Disks -> No SMART errors, just some (C7) Ultra DMA CRC Error Count, probably related to power loss


Things that I have changed recently:

  • upgrade from Windows Server 2012 R2
  • changed power supply 40 days ago


What I have tried/checked:

  • sfc /scannow -> no errors
  • Disk error checking for all disks (from Windows GUI) -> no errors
  • Event Viewer -> only "The previous system shutdown at XXX on ‎XXX was unexpected." and "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly." -> all BugcheckCode, etc properties are zero
  • When hanging occured I disconnected all USB/Video/Network cables plugged in the motherboard and the NIC. Then I connected only the network cable, still the server didn't response to pings and was hanged.
  • The temperatures are OK, CPU 44 Celsius, all HDD 35-42 Celsius
  • The PSU is a bequiet! 600W, the whole server is using 100W on average. 2HDD + 4 SSD disks, AMD FX-8320, 32GB RAM, so 600W is above requirements.
  • RAM usage does not exceed 16GB of the 32GB available, on a fresh start after all services started, it uses 8GB
  • No new software installed, last Windows Update installed at 24.04.2020


One thing I have observed is that when I was connected to the server through RDP it hanged. I immediately started to ping it with the -t parameter. It didn't respond to pings, but I left it in this state for 1 minute. After 1 minute the server un-hanged itself, started responding to pings and worked normally. There was only one case when it unhanged itself, the other ones the machine was in the hanged state for several hours before hard reset (it was at night when I was sleeping, figured it out in the morning).


I tried to generate a crash dump when this occurs, following the methods described in the articles:



I have configured the manual crash dump so that I can force one when the server is running normally, and it works, it BSOD and creates a dump file, it is configured properly. But when the machine was in a hanged state this didn't work, nothing happens, no BSOD, no crash dump. I just can't force it to create a crash dump when it is hanged.


What can I do to troubleshot this issue?

Continue reading...
 
Back
Top