Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Down the Rabbit Hole of GPU error Diagnosis


  • Please log in to reply
5 replies to this topic

#1 pikkles

pikkles

  • Members
  • 4 posts
  • OFFLINE
  •  

Posted 26 June 2017 - 08:10 PM

Hello,

 

I have been using an old Radeon HD6750 for a while now with few problems [Sytem: Athlon 2 x4 640; mobo: mcp6m-m3 (v7)]. I generally play a few games like League of Legends and some RTS's . I just about managed to get Alien Isolation to work with it which was a pleasant surprise!

 

Recently a friend donated me his Nvidia Geforce GTX760. To my knowledge he has never had any issues with it other than mention something about some heat 'issues' and the need for good cooling. After installing the card I noticed the power plugs required 8 sockets and not the previous 6. I found an adapter in amazon and voila! It worked. The system booted up nicely, all drivers auto installed and everything seemed stable (operating system, internet, videos, music etc; multiple applications). Happy days! :)  However, as soon as I began to start any games my system would crash. It started with me getting 'no input signal' notifications on the screen and ended with me getting some kind of pseudo crash - i.e., the CPU was still working but the screen went black.

 

My first step was trawling the forums in which Solution (1) was removing the old drivers and installing the new ones. I tried this several times but was still getting the crashes. (2) I tried 4-5 GPU benchmark testers which simply crashed my GPU like the games almost instantly. (3) From the way the GPU was crashing I felt/thought it was the PSU (it seemed like it was power tripping as it was operating for long enough for there to be a temp issue).

 

Eventually I spent 2 hours online with an NVidia customer assistant remotely removing the drivers and installing them again. After trying to benchmark with one of their programs and failing he just flatly stated that the card was dead. I thanked him for his patience and assistance but did not feel satisfied with the response. I bought a multimeter to test the voltages of the PSU 750w and everything checked out. Speed fan and GPU-Z were not showing any unusual temperatures but I've gone ahead and bought some cooling gel for the GPU. I have also looked into 'bottle-necking' between motherboards and GPU's seen as my motherboard is ancient and the card relatively new by comparison so that's still a gray area for me at this stage of my investigative diagnostics.

 

Summary:

 

 

1. Ran through a complete uninstall of all graphics dirvers and cards with nvidia specialist online and re-install
of all latest drivers etc but card fails  on starting games (instant crash).
2. ran Furmark, Valley GPUY tester and Nvidia agenst recommendation of Video Card stability test. These programs instantly
crash the card.
3. just checked the PSU with a multimeter and its still providing the correct voltages.
4. Going to run burnintest and see if there are any motherboard updates/issues.
---------------------------------------------------------------------------------------------
5. going to run Prime95 and unigine (expecting a crash as with previosu GPU stress testers)
6. Ran BurninTest with no probs. However OCCT gpu stress test crashes after 10 seconds.
7. heaven benchmark crashed 2 seconds after starting.
 
 
 
I have also tried to under-clocking using MSI Afterburner but no joy except giving me a few extra seconds before crashing the benchmark testers.
 
Interesting valley benchmark worked with worked with my old radeon card but it crashed with furmark; with my new NVidia card both crash immediately.
 
After running these diagnostics I'm looking at 3 possible issues/avenues:
 
(1) someone mentioned the 'no signal' issue might simply be resolved by using a HDMI adapter - I'm not sure I understand how that could crash the card unless there was some kind of bottleneck feedback thing... but ill give it a go.
 
(2) try cooling (although I don't think this is the issue)
 
(3) Trying looking at the errors I'm getting from windows 10 Reliability Logs and Events Viewer:
 
Errors:  
 
  • The system has rebooted without cleanly shutting down first. This error is caused because the system stopped responding and the hardware watchdog triggered a system reset.
  • + System
        - Provider
          [ Name] EventLog
        - EventID 6008
          [ Qualifiers] 32768
          Level 2       Task 0       Keywords 0x80000000000000     - TimeCreated
          [ SystemTime] 2017-06-26T10:25:01.411172600Z
          EventRecordID 24077       Channel System       Computer DESKTOP-QB26JBB       Security
    - EventData
          10:29:08       ‎26/‎06/‎2017                       7211                       E107060001001A000A001D0008004301E107060001001A0009001D0008004301600900003C000000010000006009000001000000B00400000100000000000000

    Binary data:

    In Words

    0000: 000607E1 001A0001 001D000A 01430008
    0010: 000607E1 001A0001 001D0009 01430008
    0020: 00000960 0000003C 00000001 00000960
    0030: 00000001 000004B0 00000001 00000000

    In Bytes

    0000: E1 07 06 00 01 00 1A 00 á.......
    0008: 0A 00 1D 00 08 00 43 01 ......C.
    0010: E1 07 06 00 01 00 1A 00 á.......
    0018: 09 00 1D 00 08 00 43 01 ......C.
    0020: 60 09 00 00 3C 00 00 00 `...<...
    0028: 01 00 00 00 60 09 00 00 ....`...
    0030: 01 00 00 00 B0 04 00 00 ....°...
    0038: 01 00 00 00 00 00 00 00 ........

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

 

 

 

+ System
    - Provider
      [ Name] Microsoft-Windows-Kernel-Power       [ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B4}
      EventID 41       Version 4       Level 1       Task 63       Opcode 0       Keywords 0x8000400000000002     - TimeCreated
      [ SystemTime] 2017-06-26T07:29:00.918479500Z
      EventRecordID 24027       Correlation     - Execution
      [ ProcessID] 4       [ ThreadID] 8
      Channel System       Computer DESKTOP-QB26JBB     - Security
      [ UserID] S-1-5-18
- EventData
    BugcheckCode 0     BugcheckParameter1 0x0     BugcheckParameter2 0x0     BugcheckParameter3 0x0     BugcheckParameter4 0x0     SleepInProgress 0     PowerButtonTimestamp 0     BootAppStatus 0     Checkpoint 0     ConnectedStandbyInProgress false     SystemSleepTransitionsToOn 0     CsEntryScenarioInstanceId 0

 

 

-------------------------------------------------------------------------------------

 

 

Any help would be appreciated: :bounce: 

 

Additional information:

 

 

PassMark BurnInTest Log file  -  http://www.passmark.com
========================================================
Date: Fri Jun 23 23:17:14 2017
BurnInTest V8.1 Pro 1013 (64-bit)
Customer: pikkles
Technician: Paul Haegeman
Machine type: AMD Athalon 2
Notes: Athalon 2/Geforce GTX 760
System summary:
Windows 10 Professional Edition build 14393 (64-bit),
1 x AMD Athlon™ II X4 640 Processor [3018.7 MHz],
4.0GB RAM,
NVIDIA GeForce GTX 760,
224GB SSD,
CD/DVDRW/BD, CD-RW/DVDRW, CD/DVDRW/BD,
General:
System Name: DESKTOP-QB26JBB
Motherboard Manufacturer: ECS
Motherboard Name: MCP61M-M3
Motherboard Version: 7.0
BIOS Manufacturer: American Megatrends Inc.
BIOS Version: 080015
BIOS Release Date: 11/03/2010
BIOS Serial Number:                      
CPU:
CPU manufacturer: AuthenticAMD
CPU Type: AMD Athlon™ II X4 640 Processor
Codename: Propus
CPUID: Family 10, Model 5, Stepping 3, Revision BL-C3
Socket: AM3 (938-pin)
Lithography: 45nm
Physical CPU's: 1
Cores per CPU: 4
Hyperthreading: Not capable
CPU features: MMX 3DNow! SSE SSE2 SSE3 SSE4a DEP PAE AMD64 XOP FMA3
Clock frequencies:
-  Measured CPU speed: 3018.7 MHz
-  Multiplier: x15.0
-  Reference Clock: 201.2 MHz
-  HT Link 201.2 MHz
Cache per CPU package:
-  L1 Instruction Cache: 4 x 64 KB
-  L1 data cache: 4 x 64 KB
-  L2 cache: 4 x 512 KB
-  L3 cache: Not applicable
Memory
Total Physical Memory: 4095MB
Available Physical Memory: 573MB
Memory devices:
   Slot 1:
   - 2GB DDR3 SDRAM PC3-10600
   -  , serial#: 87, wk/yr: 19/2011
   - 1.5V, Clk: 666.7MHz, Timings 9-9-9-24 (@ Max. freq.)
   Slot 2:
   - 2GB DDR3 SDRAM PC3-10600
   -  , serial#: 87, wk/yr: 19/2011
   - 1.5V, Clk: 666.7MHz, Timings 9-9-9-24 (@ Max. freq.)
   Slot 3:
   - Not populated
   Slot 4:
   - Not populated
Virtual memory: C:\pagefile.sys (allocated base size 4608MB)
Memory SPD:
DIMM#0
Memory type: DDR3 SDRAM
SPD revision: 1.0
Manufacturing date: Year: 2011, Week: 19
Serial number: 00000057
Clock speed: 666.7 MHz
Memory size: 2048 MB
Number of banks: 8
Row address bits: 15
Column address bits: 10
Bus width: 64 bits
Device width: 8 bits
Number of ranks: 1
ECC: No
Module voltage: 1.5V
Minimum clock cycle time (tCK): 1.500 ns
Supported CAS latencies: 5 6 7 8 9 10
Minimum CAS latency time (tAA): 13.125 ns
Minimum RAS to CAS delay time (tRCD): 13.125 ns
Minimum row precharge time (tRP): 13.125 ns
Minimum active to precharge time (tRAS): 36.000 ns
Supported timing at highest clock speed: 9-9-9-24
Minimum Row Active to Row Active Delay (tRRD): 6.000 ns
Minimum Active to Auto-Refresh Delay (tRC): 49.125 ns
Minimum Recovery Delay (tRFC): 160.000 ns
Minimum Write Recovery time (tWR): 15.000 ns
Minimum Write to Read CMD Delay (tWTR): 7.500 ns
Minimum Read to Pre-charge CMD Delay (tRTP): 7.500 ns
Minimum Four Activate Window Delay (tFAW): 30.000 ns
Operating temperature range: 0-95C
Supports Auto Self-Refresh: No
Supports Partial Array Self-Refresh: No
Thermal Sensor present: No
Supports On-Die Thermal Sensor readout: No
Non-standard SDRAM type: Standard Monolithic
Module type: UDIMM
Module Height: 29 - 30 mm
Module Thickness: Front: 1 - 2 mm, Back: 1 - 2 mm
Module Width: 133.5 mm
Reference raw card used: Raw Card B Rev. 0
DIMM#1
Memory type: DDR3 SDRAM
SPD revision: 1.0
Manufacturing date: Year: 2011, Week: 19
Serial number: 00000057
Clock speed: 666.7 MHz
Memory size: 2048 MB
Number of banks: 8
Row address bits: 15
Column address bits: 10
Bus width: 64 bits
Device width: 8 bits
Number of ranks: 1
ECC: No
Module voltage: 1.5V
Minimum clock cycle time (tCK): 1.500 ns
Supported CAS latencies: 5 6 7 8 9 10
Minimum CAS latency time (tAA): 13.125 ns
Minimum RAS to CAS delay time (tRCD): 13.125 ns
Minimum row precharge time (tRP): 13.125 ns
Minimum active to precharge time (tRAS): 36.000 ns
Supported timing at highest clock speed: 9-9-9-24
Minimum Row Active to Row Active Delay (tRRD): 6.000 ns
Minimum Active to Auto-Refresh Delay (tRC): 49.125 ns
Minimum Recovery Delay (tRFC): 160.000 ns
Minimum Write Recovery time (tWR): 15.000 ns
Minimum Write to Read CMD Delay (tWTR): 7.500 ns
Minimum Read to Pre-charge CMD Delay (tRTP): 7.500 ns
Minimum Four Activate Window Delay (tFAW): 30.000 ns
Operating temperature range: 0-95C
Supports Auto Self-Refresh: No
Supports Partial Array Self-Refresh: No
Thermal Sensor present: No
Supports On-Die Thermal Sensor readout: No
Non-standard SDRAM type: Standard Monolithic
Module type: UDIMM
Module Height: 29 - 30 mm
Module Thickness: Front: 1 - 2 mm, Back: 1 - 2 mm
Module Width: 133.5 mm
Reference raw card used: Raw Card B Rev. 0

Graphics
NVIDIA GeForce GTX 760
   Chip Type: GeForce GTX 760
   DAC Type: Integrated RAMDAC
   Memory: 2048MB
   BIOS: Version80.4.ea.0.a
   Driver provider: NVIDIA
   Driver version: 22.21.13.8253
   Driver date: 6-7-2017
   Monitor 1: 1280x960x32 75Hz (Primary monitor)
Disk volumes
A:  Floppy
C:  Local Drive, \\?\Volume{9d914295-0000-0000-0000-100000000000}\, New Volume, NTFS, (223.57GB total, 46.00GB free)
D:  Optical drive, \\?\Volume{40d22012-bda3-11e6-a7ce-806e6f6e6963}\
E:  Optical drive, \\?\Volume{d23ae11e-5547-11e7-a83c-0025224ece58}\
F:  Optical drive, \\?\Volume{d23ae136-5547-11e7-a83c-0025224ece58}\, Space Hulk Deathwing, UDF
Disk drives
Disk drive: Model: DREVO X1 SSD SCSI Disk Device Serial: TA16B1100201 (Disk: 0, Size: 223.57GB, Volumes: C)
Optical drives
E: DiscSoft Virtual (CD/DVDRW/BD)
D: HL-DT-ST DVDRAM GSA-H12N (CD-RW/DVDRW)
F: DiscSoft Virtual (CD/DVDRW/BD)
Network
NVIDIA nForce Networking Controller (Speed: 100Mb/s) (MAC: 10:78:D2:1F:F4:17) (IPv4: 192.168.0.3) (IPv6: fe80::d75:b07c:efc4:d639)
Ports
Communications Port: COM1 - RS232 Serial Port (max Baud rate: 115200)
Parallel port: LPT1
Mouse Port: PS/2  connector
Keyboard Port: PS/2  connector
USB
Standard OpenHCD USB Host Controller
   - MOSART Semi. 2.4G Keyboard Mouse
   - C-Media Electronics Inc. USB PnP Sound Device
Standard Enhanced PCI to USB Host Controller
   - Ralink 802.11 n WLAN (SN: 1.0)

**************
RESULT SUMMARY
**************
Test Start time: Fri Jun 23 22:56:25 2017
Test Stop time: Fri Jun 23 23:11:29 2017
Test Duration: 000h 15m 04s
Temperature CPU 0 (Min/Current/Max): 50.0C / 52.0C / 70.0C
Temperature HDD 0 (DREVO X1 SSD SCSI Disk Device) (Min/Current/Max): 28.0C / 41.0C / 41.0C
Temperature GPU 0 GeForce GTX 760 (Min/Current/Max): 37.0C / 37.0C / 41.0C
Test Name                   Cycles   Operations      Result Errors   Last Error
                      CPU   59       415 Billion     PASS   0        No errors
             Memory (RAM)   31       49.267 Billion  PASS   0        No errors
              2D Graphics   0        3740            PASS   0        No errors
              Temperature   -        -               PASS   0        No errors
                    Sound   10       17.155 Million  PASS   0        No errors
           Video Playback   67       868             PASS   0        No errors
                Disk (E:)   17       81.818 Billion  PASS   0        No errors
                Network 1   3        142320          PASS   0        No errors
TEST RUN PASSED
******************
DETAILED EVENT LOG
******************
LOG NOTE:  2017-06-23 22:56:25, Status, PassMark BurnInTest V8.1 Pro 1013 (64-bit)
LOG NOTE:  2017-06-23 22:56:27, Status, Main Tests started
LOG NOTE:  2017-06-23 22:56:27, CPU, AES test(s) selected, but CPU feature not detected
LOG NOTE:  2017-06-23 22:56:28, Video, Video test starting: Duty cycle = 50. Test on primary monitor (0). (x,y) = (310,0). WxH = 300x200.
LOG NOTE:  2017-06-23 22:57:16, 2D Graphics, Surface lost: Restoring.
LOG NOTE:  2017-06-23 23:11:35, Status, Test run stopped

 

 

 

 

 

Attached File  CPU-Zdiagnostic.txt   61.32KB   0 downloads

 

Attached File  GPU-Z Sensor Log.txt   288.18KB   0 downloads

 

Attached File  P95.txt   45.92KB   1 downloads

 

 



BC AdBot (Login to Remove)

 


#2 pikkles

pikkles
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  

Posted 27 June 2017 - 05:54 AM

Just spoke to the nvidia agent again who suggested i try these settings:

 

Power management model – prefer maximum performance
• Threaded optimization – off
• Triple buffering – off
• Vertical sync – off
• Maximum pre rendered frames – 3

 

 

This was actually more stable... benchmarking lasted up until a minute this time.. however still crashed..



#3 MDD1963

MDD1963

  • Members
  • 699 posts
  • OFFLINE
  •  
  • Local time:12:39 PM

Posted 29 June 2017 - 02:17 AM

Alas, your GPU is not truly 'known good', only though to have last had 'no known issues'...correct? Perhaps someone else can test it in their rig to save you countless hours of troubleshooting what might only be a failing GPU?

 

Is your PSU of sufficient wattage? (Try a known good one, from someone you've heard of, of 500 watts or more)

 

Also, driver remnants concerns, going from Nvidia to AMD and/or back, can cause lingering issues...; one sure fire way to wipe out these lingering issues is a full format/reinstall with newest chipset and GPU driver packages...


Asus Z270A Prime/7700K/32 GB DDR4-3200/GTX1060


#4 jwoods301

jwoods301

  • Members
  • 1,489 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:07:39 PM

Posted 29 June 2017 - 02:27 AM

You might take a look at John Carrona's Hardware Stripdown Troubleshooting site...

 

http://www.carrona.org/strpdown.html



#5 pikkles

pikkles
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  

Posted 30 June 2017 - 05:22 PM

Developments:

 

(1) I've applied ceramic cooling gel to both the gpu and cpu, cleaned all fans and heatsinks.

 

(2) Ran DDU to uninstall ALL drivers and fresh install the correct ones.

 

----------------------------------------------------------------------------------------------------------------

 

I'm really only left with 3 options then:

 

(1) Check to see if the card works in another machine (which I'm positive it does as my buddy wouldn't have given me this)

 

(2) Buy a new PSU despite already checking voltages manually and see if there are stability issues with the power supply.

 

(3) explore whether my MCP61M-3 elite MOBO (an AM3 socket) is incompatible in some way (although I've been told it shouldn't be) - 'bottleneck' somewhere?

 

Alas, your GPU is not truly 'known good', only though to have last had 'no known issues'...correct? Perhaps someone else can test it in their rig to save you countless hours of troubleshooting what might only be a failing GPU?

 

Is your PSU of sufficient wattage? (Try a known good one, from someone you've heard of, of 500 watts or more)

 

Also, driver remnants concerns, going from Nvidia to AMD and/or back, can cause lingering issues...; one sure fire way to wipe out these lingering issues is a full format/reinstall with newest chipset and GPU driver packages...

 

Still any stress testing is a like a system kill switch...

 

My hunch is POWER shortfall or instability.. I will sort that out before trying another system then get back to you.. :)


Edited by pikkles, 30 June 2017 - 05:23 PM.


#6 pikkles

pikkles
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  

Posted 06 July 2017 - 06:26 PM

OK.... PROBLEM SOLVED!

 

 

Was a failing GPU... although when checking with the multimeter all the voltages were correct i did not factor in stability...






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users