Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Supermicro white box running 2012R2 crashes every 12-24 hours


  • Please log in to reply
No replies to this topic

#1 kmo12345

kmo12345

  • Members
  • 1 posts
  • OFFLINE
  •  
  • Local time:09:43 AM

Posted 06 May 2018 - 05:08 PM

Hi there,

 

I have a four year old white box server running Server 2012 R2 that has been wonderfully behaved until last month. The server is running Hyper-V with a virtualized SBS 2011, 2008 R2 running a SQL server for a billing program, a pfSense firewall to run a guest network with captive portal, a Linux PBX, and a Ubuntu webserver.

 

http://speccy.piriform.com/results/xLHiq5frOLpM1TX3kShoSpB

 

The first issues I had were when users started complaining the server was slow to respond in the middle of the day. I looked through error logs and didn't see any issues but noticed the SBS 2011 was basically out of memory. I ordered another 64 GB of identical RAM.

 

After poking around a bit more I decided to check and make sure the RAID software was still configured to send me emails when it detects any errors. The software wouldn't load so I uninstalled it and reinstalled a newer version then rebooted the computer. Immediately upon reboot I started getting hundred of emails that there were unrecoverable RAID errors.

 

I grabbed two spare drives out of a NAS and threw them in but I attached them to the Intel RSTe ports instead of the LSI/Broacom/Avago RAID controller in case there was something wrong with it. I copied all of the vhd over to the new array except one which wouldn't copy due to a file copy error (when the error came up it would also trigger RAID errors so this must have been where the bad blocks were). That vhd was restored from a backup and everything was working again on the NAS drives. 

 

I removed the bad hard drives and added the new memory which had arrived and thought I was good to go. Since then I have been having BSOD every 12 - 24 hours strangely only 25% of them result in dump files being created. The first two working dump files suggested iaStorA.sys was at fault due to some sort of memory error so I updated the iaStorA.sys driver. At some point over the last 5 days (it has been a blur) I removed the rest of the new memory since it was still crashing only to have it crash again due to ntoskrnl.exe

 

Now I am back to the original configuration except for the VHDX files are all on the new RAID 1 instead of the old failed RAID 10. This server is in production but has an IPMI which allows me to connect to it remotely. The past nights I have been doing sector by sector scans of all the drives on the LSI RAID controller in preparation for switching back to it. When Western Digital sends me the drives back I am planning to build two new RAID 1 instead of one RAID 10 so I can do a clean install of the OS onto one and copy the VHD onto the other.

Attached Files



BC AdBot (Login to Remove)

 





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users