Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

hardware or OS problem??!!!!


  • Please log in to reply
30 replies to this topic

#1 JuanBernardo

JuanBernardo

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 13 February 2014 - 12:28 PM

Hi, I'm having this problem since a while ago and I'm trying to trace the origin of it. I'm a 3d-compositing artist and use several related aplications, among them Maya, Mari, Nuke, Hiero, Mudbox, etc.This problem only happens when I'm using Mari (by The Foundry), which is the most resource intense piece of software I run. I used to run Mari previously with my outdated system with no problems appart of the expected ones; I mean, I use a dual Xeon QuadCore workstation which used to have 8 GB RAM and a Quadro FX 1700 GPU and the system crashed when trying to go beyond its capabilities. Then I updated my system, I installed a Quadro K4000 GPU, now I have 16GB RAM, added a new 500GB 10000rpm hd. I installed Windows7 on that HD, and Centos 6 on the other fast disk (a 150GB 10000rpm one), I put the windows disk on top of the booting sequence and modified the loader usin EasyBCD, to being able to chose OS (I think maybe this is the problem, but cannot be sure). Everething goes alright, not a single crash ever, but when I use Mari. Mari is intended for painting very high resolution textures, using the whole power of the GPU and writes project assets to a cache location you can select. I'm sure it's not a problem related to Mari (as I said I could run it better in my old system and there are not reports similar to my problem), and I'm able to run it for a while with full performance, until when, sudenly, with no specialy high resourse use, it hangs, sometimes straight to a blue screen, sometimes freezes the system with no other remedy than phisically shuting down the system. After either of this situation, system restarts but, before OS loading , next error shows:

 

0200: Failure Fixed Disk 0

 

and it fails to go on, then I must, again, physically shut down system, after this I start the system again and everything goes OK. But it always happen, sooner or later, whatever I'm doing with Mari. According to my BIOS, Disk 0 is the 150GB HD where Linux is installed, disk that's not even "visible" for windows (it's formatted using ext4 file system). I tried everything, of course I tried different locations for Mari's cache, but it happens anyway, now that this cache is in my C: drive (500GB hd) sometimes Mari fails to write data into it, logical, somehow disk is crashing (but I repeat, and I'm not sure about previous error syntax, in my bios disk 0 is the one where centos resides, and my windows C drive is disk 3). I should add that I had and have other disks for data,and one of them is connected through a PCIE-SATA card, all the others directly to the motherboard SATA inputs.

No one gave a solution yet, but someone from support staff where I bought the computer (it's a 6 year old motherboard, but it shoul go OK I think) sugested it might be a problem related to the boot sector in the windows partition and that I should probably try to reinstall OSs and use the old known method of choosing OS using GRUB....

I'll try thatif necesary, but I'd like to be sure since it implies spending at least a week installing stuff...

GPU passes every test, memory tests are Ok, event logs tell me nothing special...and everything goes like a charm until I'm doing what I said, what, bye the way, is one of thwe main uses of my workstation.

Well, I hope I was clear enough and I'd appreciate any suggestions. Thanks in advance.



BC AdBot (Login to Remove)

 


#2 rotor123

rotor123

  • Moderator
  • 8,093 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:New Jersey
  • Local time:11:43 PM

Posted 13 February 2014 - 02:03 PM

I'm guessing that the hard drives are Western Digital brand? If they were 10K SCSI they would be a different size usually.

 

I would suggest running the Western Digital "Data Life Guard" Diagnostics on both drives if they are Western Digital drives. That error usually means a drive problem.

 

Hopefully the DLG will confirm or clear the drives.

For Windows and the same page has a link for DLG for DOS too. http://support.wdc.com/product/download.asp?groupid=612&sid=3

 

Depending on what Your boot order is in the BIOS and if Drive 0 does not have the OS Boot loader on it You could try unhooking it and see what happens. Windows may not show the drive as a drive letter, however it does detect it. And since windows does not react well to hardware disappearing that could be the source of Your problem.

 

Good Luck

Roger


Fortune Cookie says: Fortune not Found: Abort, Retry, Ignore?

Sent from my All-In-One Desktop. Perfect for Internet, Not for heavy usage or gaming however.

How Does a computer get Infected? http://www.bleepingcomputer.com/forums/t/2520/how-did-i-get-infected/
Forum Rules,    The BC Welcome Guide

167 @ June 2015


#3 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 14 February 2014 - 08:25 AM

Hi Roger,
First of all thanks a lot for your answer! Yes, they're Wd drives, I forgot the brand due to the hurry of the moment. I've ran DLG quick test and both OS drives passed it; I'll try now unhooking drive 0 (where linux resides) and see what happens; I know windows can "see" the drive (actually I see every partition in disk management, of course), I meant that I thought windows "wouldn't care"  about that drive being unaccessible its file system, but I know that's a silly supposition, being windows as "special" as it is (I'd use linux for everything but it's not possible for a freelance artist like me, who needs a lot of different pieces of specific software...). Well, I'll try that now and if all goes the same I'll run extended tests; I'll let you know after that. Thanks again for your time. Cheers



#4 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 14 February 2014 - 08:48 AM

Hi again! Well, still the same...

I unhooked the 150GB WD drive, the one labeled as disk 0 in the BIOS and the one where linux is installed. But, unfortunately, same problem; one thing I noticed is that after unpropperly shutting down the system, as always happened, same error appeared:

 

200: Failure Fixed Disk 0

 

but now, of course, disk 0 is another one (actually one of my data drives) so I guessed it's not a specific disk hardware problem, fact confirmed after I ran extended test on the 500GB WD drive, windows7 system drive and it showed no problem, PASSED. As I said before, having set Mari's cache location in that drive, now I had the added error, shown by Mari, saying that it couldn't write data to the file system, but I know it's not a problem related to cache location, because it used to be somewhere else before and I had the same problem ( no Mari's error relating to writing data, but system freezed the same) so I understand that something happens and the system crashes up to the point that cache location is unaccesesible.Then I guess I have 3 [main] possibilities:

 

1- completely remove everything related to Mari, install and configure again and try, but I'm pretty sure this won' help

2- re-install OSs, wich I doubt will help and it'll take a good time for me to put everything to work

3- forget about Mari until I find out, wich is the worst possible scenario, being Mari one of my most used aplications

(4-) win the lottery and pay for everything, but you know....

 

 

Well, thanks again for your time, any suggestions appreciated. Bye for now



#5 rotor123

rotor123

  • Moderator
  • 8,093 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:New Jersey
  • Local time:11:43 PM

Posted 14 February 2014 - 02:33 PM

Ah, You have more than the two drives in the system then? Are they all data drives? If So can You unhook them and test again? From the error message I have presumed that it is being displayed on the screen by the BIOS and the system does not boot. If all drives except the boot drive are unhooked and that still appears that would indicate that it is the Boot drive or the Motherboard or the Data or power cables to the drive.

 

Is there a setting in the bios related to a delay time for the drives? Some drives can take longer to spin up. The crash still sounds like a drive being used by that application dropping out. As an example I have a Seagate 4Tb in a USB Dock that kept dropping out. It turned out to be the dock itself. The drive works fine in a External case.

 

Can You move the Boot drive to the controller the Linux drive was using and set that in the bios as a further test?

 

It might be time to also run Speccy.  Be sure to post the link and not the Speccy report You see as what You see has the Windows product key in it whereas the Link (publish snapshot) does not.

 

  • Go to Piriform's website, and click the big download.png button.
  • Next, click Download from Piriform.com (the FileHippo link requires an extra click). Or if you want to use a portable version of Speccy (which doesn't require installation), click the builds page link and download the portable version.
  • You will now be asked where you want to save the file. The best place to put it is the Desktop, as it will be easy to find later.
  • After the file finishes downloading, you are ready to run Speccy. If you downloaded the installer, simply double-click on it and follow the prompts until installation is complete. If you downloaded the portable version, you will need to unzip it before use. Right-click the ZIP file and click Extract all. Click Next. Open up the extracted folder and double-click on Speccy.
  • Once inside Speccy, it will look similar to this (with your computer's specifications, of course):
    p22004369.gif
  • Now, at the top, click File > Publish Snapshot
  • You will see the following prompt:
    p22004371.gif
  • Click Yes > then Copy to Clipboard
    p22004372.gif
  • Now, once you are back in the forum topic you are posting in, click the p22004370.gif button. Right-click in the empty space of the Reply box and click Paste. Then, click Add Reply below the Reply box.
  • Congrats! You have just posted your specs!

 

Good Luck

Roger


Fortune Cookie says: Fortune not Found: Abort, Retry, Ignore?

Sent from my All-In-One Desktop. Perfect for Internet, Not for heavy usage or gaming however.

How Does a computer get Infected? http://www.bleepingcomputer.com/forums/t/2520/how-did-i-get-infected/
Forum Rules,    The BC Welcome Guide

167 @ June 2015


#6 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 15 February 2014 - 07:35 AM

Hi Roger,

Thanks again for your time!.I didn't have time yet for further testing, I'll let you know soon about it. By now, let me clarify some points.

My workstation has a Tyan motherboard I bought 6 years ago, so I'm also thinking it might be a problem related to firmware or BIOS update, but of course I'm not sure because I've never had this problem before.

I used to run Mari in that workstation with no unpredictable problem, I mean it crashed when I tried to go beyond my system capabilities, i.e. painting 4K 16bit textures and a lot of layers, then Mari showed error showing GPU or RAM crash. My outdated system was:

 

2x Xeon Quadcore ES5420@2.5GHz processors, 8GB RAM , 150GB WD drive for windows system and several data drives, Quadro FX1700, Wacom Bamboo, Windows7 Ultimate x64

then I updated this same workstation to:

2x Xeon Quadcore ES5420@2.5GHz processors (the same, of course), 16GB RAM , 150GB WD drive for Centos system, 500GB WD drive for windows system and several data drives (I had to connect one of them through a PCIE/SATA card for my motherboard had no more free SATA inputs, but everything went OK), Quadro K4000, Wacom Intuos Pro, Windows7 Ultimate x64 .I keep using Mari on windows, I put the windows system drive on top of the boot order sequence and used EasyBCD for writing on the boot sector and so being able to dual boot

 

Everything goes OK ALWAYS, except when I run Mari, it starts OK, I paint and work for, let's say, 10 minutes and then, whatever I'm doing, either if there are or not other application running, system freezes, it's like C drive-500GB WD one, windows system drive-stops to respond; even if I try to open task manager windows says that taskmgr.exe is not a valid win32 application (when I set Mari's cache location to other disk, no further error from Mari, now that I set that location inside C Mari pops up error-before system completely freezing- saying that it could not wright data to the file system, but same problem anyway). After this behavior, I have no other way than to shut system off with power button and, after this, OS fails to load, BIOS showing the "0200 error", and again power button to go on, to shut system off and start again; but after this, I can start system normally. Drives show no problem except in this situation and, as you see, drives seem to be OK. It doesn't matter what I do (except Mari), I have no problem, I can use any application, render "heavy" projects, I can let workstation rendering for a week, writing data to any drive, I do back ups regularly to and from different drives, etc.

As I said, when I unhooked the linux drive, problem was the same, and same error ("0200: Failure fixed disk 0", being in that case disk 0 another drive- in this case one of the data drives), I can unhook everything but I'm quite sure it won't change anything, I'll try anyway. )

Only option relating to drives speed in BIOS is "32bit transfer mode", which is recommended to be enabled, but it doesn't change this behavior whenever it is enabled or disabled.

Only thing I think might have any relation is that windows boot loader was modified using EasyBCD for allowing dual boot, instead of common practice of putting linux drive on top and using GRUB, but I'm just guessing. Due to the age of motherboard I'm thinking it might be a problem related to firmware or BIOS (I used to have a 150GB system drive and now I have a 500GB one...), again just guessing.

Well, I'll keep trying, before formatting system drives, and I'll let you know. Bye for now.



#7 ranchhand1

ranchhand1

  • Members
  • 76 posts
  • OFFLINE
  •  
  • Local time:09:43 PM

Posted 15 February 2014 - 09:46 AM

Run Speccy and check your operating temperatures, particularly not just the CPU but the hard drives also.  I also recommend Open Hardware monitor (free) which will give you temperature readout on each hard drive for verification. The advantage is that Open Hardware will give you realtime temperatures, so you can leave it open while you are working and see if there are any temperature spikes at certain times using certain software.



#8 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 16 February 2014 - 02:03 PM

http://speccy.piriform.com/results/ZSKMoEfTsHiHU5uwOZZa7pu

 

that's Specci, just what I described earlier...



#9 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 16 February 2014 - 02:50 PM

Hi, thanks all of you! Well, I ran Speccy as I showed, everything OK, then I uninstalled Mari and cleaned registry, then I ran open hardware monitor (this one and Speccy greats tools, thanks), and everything ok. I started working with Mari, this time it went, after a while painting with no problem, straight to blue screen, as it sometimes happens (no freezing previously). This all was as I expected, no temperature raise, no RAM flushing...ok until crash, what's what I expected, as I told you everything works perfect except in this only situation, and I use lots of computing power all the time with no issue. Well, tomorrow I'll do further testing and I'll start thinking about reinstalling OSs. Bye for now, thanks again guys



#10 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 16 February 2014 - 02:51 PM

Hi, thanks all of you! Well, I ran Speccy as I showed, everything OK, then I uninstalled Mari and cleaned registry, then I ran open hardware monitor (this one and Speccy greats tools, thanks), and everything ok. I started working with Mari, this time it went, after a while painting with no problem, straight to blue screen, as it sometimes happens (no freezing previously). This all was as I expected, no temperature raise, no RAM flushing...ok until crash, what's what I expected, as I told you everything works perfect except in this only situation, and I use lots of computing power all the time with no issue. Well, tomorrow I'll do further testing and I'll start thinking about reinstalling OSs. Bye for now, thanks again guys

(of course, I re-installed Mari again...)



#11 OldPhil

OldPhil

    Doppleganger


  • Members
  • 4,084 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Long Island New York
  • Local time:11:43 PM

Posted 16 February 2014 - 05:44 PM

Curious what is the wattage of the PSU?


Honesty & Integrity Above All!


#12 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 17 February 2014 - 05:29 AM

Curious what is the wattage of the PSU?

That's an interesting question! It's supposed to be a Tacens Radix VI 850W. I say it's supposed to be because-I haven't thought about it till now- original PSU burned a year ago-before system update- and it was replaced for  this one, but replacement was done by technical staff of a hardware store that wasn't the one that sold me the workstation, people not specialized on high end workstations like mine. It's supposed to go ok, but maybe it's not enough after system update...or maybe original motherboard is not able to handle so much power....I'll take a look, I'll call my workstations vendor again and I'll try some stress test before going on



#13 JuanBernardo

JuanBernardo
  • Topic Starter

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:04:43 AM

Posted 17 February 2014 - 06:50 AM

Hi, I think I'm on the way to find out the source of the problem. I had ran some "mild" benchmark tests of my GPU using FurMark before (I was afraid to stress my GPU in a more intense way) but now I tried FurMark's 1080 preset and I reproduced the issue, test was going alright for 5-10 minutes (I let the test running while going to wash dishes, so I'm not sure about exact time...) and then system froze the same way! and its behavior was the same when shutting off-restarting, first BIOS failed to load OS showing 0200 error, then , after another button-shut down-restart, everything went to "normal". So problem's related to GPU, maybe the GPU itself, maybe power feeding it, maybe...whatever, but relation looks clear to me, I'm calling my system's vendor right now



#14 OldPhil

OldPhil

    Doppleganger


  • Members
  • 4,084 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Long Island New York
  • Local time:11:43 PM

Posted 17 February 2014 - 07:36 AM

Do you have or can you borrow a lesser GPU to be able to see if it makes the difference, even with 850 you your GPU may suck the life out of it when it works hard.


Honesty & Integrity Above All!


#15 Anshad Edavana

Anshad Edavana

  • Members
  • 2,805 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:India
  • Local time:09:13 AM

Posted 17 February 2014 - 08:25 AM

Hi

 

The SMART data collected by "Speccy" shows one of your drive is failing. The failing drive seems to be WDC WD5000HHTZ-04N21V0 500 GB one.

 

C5 Current Pending Sector Count: 200 (200) Data 0000000002

 

A non zero value of "Current Pending Sector" means there are bad sectors on the drive surface which are waiting to be rempped ( the drive is dying ). If it was a home PC, i would recommend to run the "WD Diagnostics" which will automatically remap the bad sectors and repair the drive. But since your machine is a workstation i don't recommend to do that. Replace the drive with a new one is the best thing to do.

 

 

http://kb.acronis.com/content/9133

 

 

 

This is a critical parameter. Degradation of this parameter may indicate imminent drive failure. Urgent data backup and hardware replacement is recommended.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users