ASUS EAH4870 ATI/AMD Radeon Graphics Card Problems - Update and Fix

***UPDATE 11/29/2010*** These errors have recently been resurfacing for me, sadly, even with the fixes I have tried - but not sure why.  I sincerely hope the TDRDelay fixes work for you, but it seems - for me - this is not the magic bullet I was looking for.  If you have any ideas to share, please leave them in the comments - I'll try anything at this point! ***

I blogged earlier about a BIOS editing method that helped fix some of the woes I was experiencing with my newly purchased ASUS ATI Radeon HD4870 1GB.  (NewEgg link if you're curious about the exact model)

This, however, doesn't really fix the problem.  At all.  I thought it did, until I started experiencing the crashes again.  However, I think I've found a fix that actually works, and it is definitely NOT what I expected.

The Problem with the ASUS EAH4870 or Radeon HD 4000 Series In General

For some reason, this particular graphics card becomes unstable when switching fan speeds or core clock speed.  The ATI BIOS (dubbed PowerPlay) is designed to underclock the processor when in 2D mode and clock up to standard speed in 3D.  This would be great, except for that it becomes as failure-prone as a intoxicated Jersey Shore cast member at a spelling bee.  This is a common issue - just Google for it...

In addition, this problem is not limited to the HD 4870 nor the 4000 series - the 5000 series reportedly experiences it as well, and who knows how many others.

The Crash

So, you're playing a game (it only does this when under load)and all of a sudden the game freezes.  Damn. Then the screen goes blank for a minute, but during that process, you can still hear the audio from the game.  Then - sometimes - it dumps you back in the game.  Other times, you cannot recover from the error and have to close the game window.  This is particularly annoying because it seems totally random, and the card is otherwise fine.

What Doesn't Really Work

So, to circumvent this, I tried various methods to stabilize the clock and fan speed, including but not limited to:

-  ASUS SmartDoctor (kinda cheesy but it does work)

-  ATI Tray Tools (an awesome tool)

-  RivaTuner

-  Manually editing the BIOS with RBE and flashing in DOS mode with ATIFlash (also works well, but fraught with peril)

Most of these would do the job, but not always.  The video card REALLY WANTS to switch modes, and you basically have to forcefully pimp-slap it to get it to stay at a manual fan speed and clock speed.  For example, try using RivaTuner to adjust the clock and fan speeds and play a game.  Make sure you have monitoring and logging on (GPU-Z is a great program to use for this.) When you experience the crash, look at the logs - I bet it either changed fan speed or clock speed, even though you told it not to with the software, and that correlates directly with the crash.

What Has Worked (So Far)

If you really dug into your issue before landing here, you might have checked the Windows Event Viewer and seen this message:

"The display driver atikmdag has stopped responding as was recovered."

This is called a TDR error, or "Timeout Detection and Recovery" error.  This is a feature (and sometimes an anti-feature) of WIndows Vista and Windows 7.  What happens is that your ATI card freaks out and goes postal when the BIOS attempts to change clock speeds or fan speed (and sometimes even if it doesn't.)  Windows attempts to automatically recover from this "error" and recover the display, but this is an unreliable process, and thus it crashes.

The solution is simple.  I concluded that my graphics card was working fine; indeed, when it worked, it was blazing fast and stable.  It only crashed when these speed changes were happening in the background.  So, after some more spelunking, I discovered that you can actually disable TDR in the registry so it makes no attempt to recover when the GPU does this, or otherwise flips out for a moment.

Go to regedit and find this location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers 

When you get there, right click on it and add a DWORD entry named "TDRDelay."  This will automatically create a registry entry (yes, you have to actually create one manually) and for me, it defaulted to a value of zero, which is what you want.  If it's not zero, change it to zero - otherwise you will actually be increasing the delay, meaning that when it crashes you have to sit there for even longer while it recovers, causing your urge to go Office Space on your rig rises sharply.

I have only tested this for a few hours, but even the crashiest games (I'm looking at you, Red Faction Guerrilla) have been smooth as silk without one annoying crash.

If your problem is the same as mine, I'm sincerely hoping you find this blog and that this helps you.  I have been spending weeks testing, searching, diagnosing, and losing hair over this issue and I think this has finally solved it.  I would love to bring you the same joy I have experienced actually playing my games without fear of this annoying crash.

However, I'll be sure to update if they crop up again, which is always possible - but for now, I'm going to enjoy my system.

On a side note, I have read there are tons of other things you can try to solve GPU issues without resorting to voodoo, including adjusting RAM timings, increasing your FSB speed, and others.