Donnerstag, 18. Oktober 2012

Computer Damage Chronicles

Nothing funny today, nothing about the games I play, just a bit of a status report on that bucket of screws I depend on to pay my bills.

PC seems stable for now. It has been running without crashes since Sunday, but knowing my luck, the next blackout is waiting around the next corner, now that I'm writing about how it all works again. I hope it's all fixed now, I hope the problems won't come back and I'm seriously starting to doubt my computer troubleshooting abilities. It's almost like some stupid classical crime show - I can fix the PC problems of all friends, family members and neighbours around me, yet the only PC I cannot fix is my own. What a fucking cliché! It's like Monk solving hundreds of murder cases and never figuring out the death of his wife or the date doctor hooking up fat people, whilst being unable to establish a healthy relationship. I'm hoping this is my minute 182 or my season 8, final episode, my happy ending, because I'm tired of replacing parts and buying new shit all the time.

The problem didn't even seem all that complicated: I'd run a game, any game and after a seemingly random amount of time, ranging from 30 seconds to five whole hours, my screens would go black, my sound would get stuck in a loop and the whole machine would lock up, requiring a hard reboot. Half of the time, it wouldn't even boot up anymore. Black screen, all fans spinning out like jet engines and the only way to get around this would be by re-seating the gfx card, unplugging the PSU, doing all kinds of random voodoo shit until the damn thing booted up again. Sometimes it would boot up after I removed some of my RAM, making me think I had found the problem. So I played around with reduced memory for a day, two days, everything seemed fine and the crash came back. Tried Memtest to be absolutely sure, ran it for 20 hours, no problems.

So it wasn't my fucking memory. And I thought, hey, maybe it's an audio problem, what with the sound loop and all and after some research, it turned out there's a lot of people who get crashes with Nvidia's HD audio driver, which automatically installs itself every fucking time you boot windows, no matter how many times you remove it. And yes, one of the audio controllers shared an IRQ with the gfx card, so I disabled the bad guy, hoping it would fix my problem. No joy. Mind you, that was after I had removed my sound card and tried disabling onboard audio in the bios settings, messing around with absolutely no sound at all. Nothing worked.

What else could it be? Temperatures? MSI Afterburner showed absolutely no problems on the GPU front, but core temp complained about my quadcore CPU reaching 50-60 degrees under medium workload. Nothing alarming, but far from ideal. So I went and bought some thermal paste, removed the massive cooling fan, dropped one of the screws in my PSU, removed my PSU, removed all screws from the PSU to open the damn thing and remove the screw from the fan, wiped the old thermal paste off my cpu, applied the new stuff after watching a dozen videos and forum threads on the perfect way to apply thermal paste... seriously, google that shit. People have flame-wars of religious proportions on how to apply thermal paste! So yeah, I've done that, put the machine back together, ran a game aaaaand it blacked out again.

So I checked the manufacturer's website for my gfx card. It's a factory overclocked one, which comes with one major flaw, right out of the box: The default voltage setting is too low and the card just dies after a minute or two. It's a known and confirmed problem, I figured it out the day I got it, fixed it with MSI Afterburner and the thing had been running fine for over a year. But if a manufacturer misses such a critical problem, then maybe there was even more crap going on with the card. And lo and behold: 20+ forum threads about how the actual card uses cheap aluminium heatsinks as opposed to the copper ones you see on the website, the packshots, yada, yada, yada. Card gets too hot, there's no proper thermal paste on the gpu, you get the idea. So I went and took the damn thing apart and yes, no copper, only aluminium and the "thermal paste" was a crusty old booger. So I freshened that up a bit, put it all back together and crashed again. Some more research revealed that certain cards actually crash, because the VRam gets too hot, not the actual GPU, so I applied thermal paste to that as well and it didn't fix anything. Back to square one.

Of course, I've also upgraded my bios, chipset drivers, tried gfx drivers from a year ago to the latest beta ones, but to no avail. I won't go into how the nvidia updater sucks and downloads the wrong shit for certain mainboards, causing even more problems. Let's just say I had to spend half a day undoing the damage done by the updaters and fixing utilities provided on my board's official support website.

By this time, I was getting more and more convinced that my gfx card was the problem. I ruled out the memory, all drivers were up to date, it wasn't an IRQ conflict and the crash seemed fairly obvious: It would happen during games. On some rare occasions, playing video files would result in the same thing. And then the PC would refuse to boot up again, the fans on the card would go nuts, reconnecting the card would help boot it up again. At that point I was completely ignoring the fact that removing random shit like a stick of memory would sometimes fix it as well, god knows why.

I've also tried to rule out a faulty PSU by disconnecting absolutely everything I didn't need. No dvd drives, no fancy USB shit, just one hard drive, gfx card, onboard sound, memory, rodent, keyboard, that's it. Unplugged all the cooling fans in the case, removed absolutely everything that would consume power and wasn't absolutely necessary. It didn't fix anything. And I refused to believe there was anything wrong with the PSU, because it was a brand name 500w PSU, my gfx card only required 450w and that combination had been running fine for over a year, so why would it blow up on me now? Wear and tear can cause exactly that kind of problem with a PSU, but at that time, I was absolutely sure it was my gfx card.

So what's the easiest way to rule out a damaged GPU? Replace it. I removed my Nvidia card and borrowed Claire's HD6850. And my machine wouldn't boot up with it, no matter what. The Nvidia chipset on my board probably didn't help things and I couldn't fit her card in the primary PCI-E slot, so one of the two SLI-slots would have to do. And those weren't too keen on doing anything with Claire's ATI hardware. It just didn't boot up at all. Plan B: Put my gfx card in Claire's PC, see if she crashes. And of course we couldn't pull that off, either: One of her PSU's 6pin PCI-E power connectors was damaged. Damn thing just wouldn't stay connected to my gfx card, so we couldn't even power up the damn thing. And since my mainboard doesn't have any onboard gfx, we had to buy a replacement card.

So we checked our finances, tried to come up with some clever way to afford at least a mid-range current gen gpu, but there was just no way we could afford something like that at that time. A friend helped me out with his old GTX280. Awesome piece of hardware back in its day and still not the worst choice today, if you can live without tesselation and other fancy shit, which 95% of today's games don't even support, anyway. And what do you know - I put that thing in there, it booted up straight away, no problems, no crashes, no nothing. For a whole day. So I did something incredibly stupid. You might wanna take notes, because this is gonna be important later on. :P
I ran MSI Afterburner to check on the card's clock settings. Yes, the card was fucking nice, but it was still a downgrade from my old hardware and I decided it couldn't hurt to do some research on safe settings in order to overclock the card at least a little bit. You know, get some of that old power back. And the furmark tests were fine, I got a small performance boost out of it, I ran a few more games aaaaand it blacked out. Exact same crash again. Black screen, sound loop, lawnmower when I tried to reboot.

I've restored the card back to factory settings the next time I managed to boot up. The overclocking I had done was minute at best, there was no way in hell I had damaged the hardware, I did a lot of research beforehand, but I didn't wanna push my luck, so I went back to default settings. And it blacked out again. And again! What's worse, booting would now fail half of the time, thanks to a bluescreen: C000021a, initialization of a knowndll failed, 0xc0000221, blah, blah... damaged hardware. Did I melt my friend's gfx card after all? I put my old card back in there, same error message, so at least it wasn't that. Whew.

I was running out of ideas and out of options. Try a google search for black screen crashes. Hundreds of forum posts and no useful answer. 90% of the time these threads will just end without a proper solution and the rest of them ends with any random piece of broken hardware. PSU, GPU, memory, the whole damn board. If I had saved any money, I would have blown it all out on a brand new machine at this point.

Instead, I tried to reconstruct my working and gaming environment using Claire's old laptop. I'm a guy. I hate change. I want stuff to stay the way I know it. So I connected the laptop to the large LCD screen, plugged in my gaming mouse and a keyboard, replicated my desktop and customization settings the best I could and started gaming and working at (almost) 720p and the modest frame rates an old 230m budget gfx chip, which had been outdated three generations ago, could muster. Not a happy experience. And a couple weeks later, the laptop would start overheating and shutting down without warning. Of course. No laptop is designed for that kind of torture, not for prolonged amounts of time.

That's when I decided to try something we should have done ages ago: Put my old PSU into Claire's PC to see whether or not it crashes. If my PSU was really failing, Claire's machine would black out as well. In my desperation, I've also done something that didn't seem possible before: I put Claire's PSU into my machine. Remember? One of her power connectors wouldn't fit in my gfx card. I forced it. At this point I no longer gave a shit about damaging my hardware, I just slammed it in there until it fit. Oh the potential for analogies!

And Claire's PC worked. With my PSU. To be fair, her HD6850 only requires one modest 6pin PCI-E connector, so the entire system is a lot less demanding and much more energy-efficient than mine. But it worked, it was stable, her machine, my PSU. And my computer with her PSU? No fucking clue. Remember those blue screens I had? The defective hardware was my hard drive. Maybe it had been failing all along and that's what caused the crashes. Maybe I was being too careless and damaged it whilst rummaging around my machine's intestines over and over again. Whatever the cause, the HDD was dead, would no longer boot up Windows, crashed on me, finito, end of.

So I've ordered a new 7200rpm Barracuda. Used some awesome little utility called 'Parted Magic' to rescue as much of the data as I could possibly grab off the failed drive. Started a new installation of Windows 7. Took me a day to get everything back in working order. You know, all my image and video editing software I need for work, my FTP settings, text editing software, KMPlayer, the works. Started a game and my gfx card crashed. Not a big deal, I just forgot to install the latest version of MSI Afterburner to crank up the voltage a tiny bit. That known and confirmed problem, that tiny little bug one can fix with a click or two. And the machine blacked out again.

Brand new windows. New hard drive. Different PSU. Memory had already been checked three dozen times. And the PC would crash with either gfx card on the old Windows installation, so that wasn't the problem. But fuck it, I put in my friend's old GTX280 for the heck of it. Nothing else to do, right?
I didn't wanna risk overclocking again, so I just ran the card, no Afterburner, nothing else. And it all worked. For a day. Two days. Three days. No crashes whatsoever. Put in my own gfx card again. Crashed. Motherfucker!

The problem was MSI Afterburner all along. At some point, with some update or during some stupid moment where I fiddled around with the fucking settings, I must have enabled low level hardware access. A quick google search revealed that this feature could cause the exact same crash I was experiencing. So I turned it off and the crashes stopped.

Three months. I've been having this problem for three fucking months. Replaced the gfx card, checked the memory, applied fresh thermal paste in every possible spot, switched PSUs, installed a new hard drive and a new OS. Okay, the HDD was necessary, as the old one had definitely died. Maybe it died from all the crashes and hard resets, maybe I physically broke it whilst taking the whole machine apart time and again. Fuck if I know. Unticking one god damn checkbox in one stupid little program fixed my problem.

But there has been one small incident, which made me feel at least a little better about the whole mess. We put the old GTX280 in Claire's PC. The one that runs on my old PSU now. It died after five minutes. The PSU was failing after all. It still runs her modest little ATI card just fine, but we should replace the PSU as soon as we have a little extra money, just to be safe. So it wasn't Afterburner alone. I'm not a complete idiot after all. But pretty damn close to it.

So. A failing power supply unit. And one tiny setting in some little overclocking and monitoring utility. What do you know. But my machine had been screaming for a fresh, clean installation of Windows for a long time now. And the new HDD is pretty damn sweet. So it wasn't all bad. I just hope that this is really gonna be the end of it. Time to catch up on all the gaming I had to miss out on!

-Cat

1 Kommentar:

  1. Oh-your-god.
    What a mess! I haven't had a problem like this for at least 8 years now, but I can sooo relate to your pain. Sometimes, a PC can be a real bitch. Fingers crossed that I don't have to encounter something like that ever again.

    AntwortenLöschen