Rickard Andersson

The Internet is amazing

Much of what I do at work is problem solving. Something doesn’t work the way it should and I dive in and try to figure out what the problem is, what’s causing the problem and ultimately what to do about it. I love this part of my job. Whether it’s digging through code, browsing logs or troubleshooting application errors, I’m a happy camper.

Some time ago, we noticed that virtually all of the many hundreds of machines on campus, regardless of hardware configuration, were intermittently crashing. We were never able to reproduce the problem, but it was happening at a rate of maybe one or two machines a day. We started out by trying all the standard fixes such as updating drivers and BIOS as well as completely reinstalling some of the machines. Nothing seemed to help. The machines blue-screened and reported various different error codes (7e, 50, 0a etc). We were stumped.

At this point, I was getting pretty frustrated at not being able to solve the problem. In a desperate attempt at finding out what was causing this, I installed WinDbg (part of the Windows Debugging Tools) and loaded up a couple of minidumps from a handful of machines. Using the analyze command, you can get WinDbg to parse the memory dump and output what it thinks might be the culprit behind the crash. I was hoping that the different memory dumps would point to some kind of common driver or executable, but some of them blamed fastfat.sys, others pointed a finger at ntkrpamp.exe and some put the blame on “memory_corruption”. I was getting nowhere and I needed help.

I searched around for a good discussion forum to ask for help and ended up in the troubleshooting section of the Sysinternals forums. More or less immediately, I got a response from someone called Scott. He directed me to enable full memory dumps on a couple of machines as well as enabling Driver Verified on any non-Microsoft drivers. At this time, I had never even heard of the tool called Driver Verified, but apparently, it’s been included in Windows since Windows 2000. Here’s what Wikipedia has to say about it:

Driver Verifier is a tool included in Microsoft Windows that replaces the default operating system subroutines with ones that are specifically developed to catch device driver bugs. [1] Once enabled, it monitors and stresses drivers to detect illegal function calls or actions that may be causing system corruption. It acts within the kernel mode and can target specific device drivers for continual checking or make driver verifier functionality multithreaded, so that several device drivers can be monitored at the same time. [1] It can simulate certain conditions such as low memory, I/O verification, pool tracking, IRQL checking, deadlock detection, DMA checks, IRP logging etc.

So I enabled it and after a couple of days I had a number of full memory dumps created while Driver Verified was running. I loaded them up in WinDbg, but I was none the wiser. I needed more help so I took the liberty of sending Scott a private message asking him if he would be willing to take a quick look at the dumps for me. Scott replied that he actually enjoyed groveling trough memory dumps and that he in fact taught a week long crash dump analysis lab! Talk about finding the right man for the job. So I sent the dumps to Scott who got back to me shortly thereafter with a theory.

At the time of the crash, it appeared that a ZIP file was being flushed out to a removable FAT drive, but the in-memory structures for the file had been torn down already, causing the crash. Scott was able to track the memory address of the prematurely torn down structure to an “SRTSP structure”. SRTSP.sys is a Symantec Antivirus filter driver. It seemed to make sense. Symantec does indeed check files before they are saved to removable drives. Scott also informed me that the driver in question was about a year old and that we could try upgrading it to the latest version. We did and after about two weeks, we have yet to experience one single crash.

The moral of the story I guess is that I should have known better than to use an almost 1 year old version of the Symantec Endpoint Protection client, but the cool thing about the story is Scott. I was stuck and asked for help in an online discussion forum. To my rescue came a complete stranger that not only put time and effort into helping me, but also turned out to be extremely competent at what he did. Amazing!

Thank you very much Scott!

Introducing Affirmative

As I mentioned a few posts back, I’m keen on getting into Mac development. At first, I was just intrigued with the concept of Grand Central Dispatch, but now I’m just generally into it. I needed a “Hello World” application to get me started, so I figured I would do an application that verifies SFV files. There are other apps for the Mac that do this, but what the hell, choice is good, right?

So, without further ado, I present to you Affirmative!

When I started the project, I figured CRC32 calculations were a perfect fit for Grand Central Dispatch. I toyed with the idea of using GCD, but it turns out that doing CRC32 calculations is I/O bound, not CPU bound. That is, the bottleneck is the hard drive, not the CPU. Spawning multiple threads to calculate the CRC32 checksum using GCD actually worsened performance as all the threads were waiting on the hard drive instead of crunching numbers. With this information in hand, I decided to stay away from GCD for this project and do a relatively common two-thread application instead.

Stay tuned for more apps. I actually have a pretty good idea of what I’m gonna do next.

Goodbye hard drive!

I ordered an SSD today, an Intel X25 80GB (2nd generation). They’ve been sold out everywhere for ages now, but I think I might have gotten a hold of one. I can’t wait to give it a spin (even though it doesn’t actually spin). I have a fresh Snow Leopard install on my current regular 7200rpm drive that I’m going to copy over to the SSD so I can compare. Heck, I might even record a short video if the difference is really as noticeable as people are saying.

I will of course have to keep one or two regular drives for storage, but OSX and all my apps are going on the SSD.

Inside the Meltdown

Just wanted to recommend an excellent PBS Frontline documentary on how the US economy went bad so fast and the whole housing bubble/mortgage debacle.

This is just one of many great documentaries available for viewing at pbs.org.

Snow Leopard review

I just plowed through John Siracusa’s review of Snow Leopard over on Ars Technica. All 26 pages of it. If you only read one review of Snow Leopard, and you like your reviews indepth, this is the one. I particularly enjoyed the parts pertaining to Grand Central Dispatch (GCD) and OpenCL. Just the page on GCD alone made me want to get back to OSX development. Maybe it’s time to start up Xcode again and give it a go.

Now all that remains is a great idea for an application. I’m all ears!