I have a four year old white box server running Server 2012 R2 that has been wonderfully behaved until last month. The server is running Hyper-V with a virtualized SBS 2011, 2008 R2 running a SQL server for a billing program, a pfSense firewall to run a guest network with captive portal, a Linux PBX, and a Ubuntu webserver.
The first issues I had were when users started complaining the server was slow to respond in the middle of the day. I looked through error logs and didn't see any issues but noticed the SBS 2011 was basically out of memory. I ordered another 64 GB of identical RAM.
After poking around a bit more I decided to check and make sure the RAID software was still configured to send me emails when it detects any errors. The software wouldn't load so I uninstalled it and reinstalled a newer version then rebooted the computer. Immediately upon reboot I started getting hundred of emails that there were unrecoverable RAID errors.
I grabbed two spare drives out of a NAS and threw them in but I attached them to the Intel RSTe ports instead of the LSI/Broacom/Avago RAID controller in case there was something wrong with it. I copied all of the vhd over to the new array except one which wouldn't copy due to a file copy error (when the error came up it would also trigger RAID errors so this must have been where the bad blocks were). That vhd was restored from a backup and everything was working again on the NAS drives.
I removed the bad hard drives and added the new memory which had arrived and thought I was good to go. Since then I have been having BSOD every 12 - 24 hours strangely only 25% of them result in dump files being created. The first two working dump files suggested iaStorA.sys was at fault due to some sort of memory error so I updated the iaStorA.sys driver. At some point over the last 5 days (it has been a blur) I removed the rest of the new memory since it was still crashing only to have it crash again due to ntoskrnl.exe
Now I am back to the original configuration except for the VHDX files are all on the new RAID 1 instead of the old failed RAID 10. This server is in production but has an IPMI which allows me to connect to it remotely. The past nights I have been doing sector by sector scans of all the drives on the LSI RAID controller in preparation for switching back to it. When Western Digital sends me the drives back I am planning to build two new RAID 1 instead of one RAID 10 so I can do a clean install of the OS onto one and copy the VHD onto the other.