Sunday, October 14, 2007

Static Code Analysis with Visual Studio 2005

Not too long ago I received a jarring reminder that "char" in Visual C++ is signed by default. The problem was related to the ctype functions, such as isalnum() and isspace(), all of which take an integer. When you sign-extend an eight-bit multibyte character (MBCS), you end up with a negative number. The ctype functions then do a table lookup on a negative number and your program crashes. This wouldn't be so interesting, except that this bug existed in our production code for six years before the software crashed for a customer in Europe. If the customer hadn't been willing to give me sample data to reproduce the problem, I'd still be scratching my head.

Apparently I'm not the only person who has run into this issue. Visual C++ 2005 includes a command line option named "/analyze" for finding this problem and many others. I had tried /analyze during the Whidbey Beta two years ago but didn't get useful results. Now the results were much more helpful.

The "/analyze" options enables the static code analysis feature in the C++ compiler. This feature is based on prefast, a technology that came out of Microsoft Research in 2001. Prefast knows about hundreds of common programming problems. I ran it on our codebase, which after eight years is quite mature and already compiled under warning level 4 with minimal problems.

What I learned was quite interesting. Prefast knows about the ctype problem I mentioned earlier and pointed out several lines I missed when I tried to fix the problem. All of those lines were crashes waiting to happen.

Prefast found several places where I expected one return type but was getting another. For example, at one point I checked for an HRESULT but was actually getting a bool, which meant that the sense of my error check was inverted.

Another error that prefast found was where I was calling sizeof on a variable that was defined as pointer instead of as an array, which meant that the length being handed to strncmp was wildly wrong. The code still worked, but that was a happy accident.

Prefast also had numerous warnings about the Boost C++ Library. It should be possible to suppress those warnings, but I haven't done so yet.

Prefast supports annotating your code with the Standard Annotation Language (SAL) to better describe the static behavior of the code. If you've looked at recent versions of the Windows SDK, there are numerous annotations such as __out_ecount(). These annotations provide additional information to prefast that allows for better analysis. You can learn more about these annotations at blogs.msdn.com.

Prefast also pointed out two problems with my code that I would never have found, even with a close code inspection. I'm sure you've seen stricmp, the case-insensitive version of strcmp. I was using stricmp to check for keywords. Turns out that this is a bad idea. Prefast gives warning C6400, which explains that some languages interpret combinations of letters as a single letter, which changes the behavior of stricmp. The correct solution is to use the Windows API call CompareString(), which should be set to LOCALE_INVARIANT or LANG_ENGLISH, depending on the version of Windows.

Prefast also warned me that using _alloca in a loop could cause a stack overflow. Normally I'd consider this obvious, but in this case the warning was being given about the W2A Unicode to MBCS conversion macro, which uses _alloca. I looked at the source code to W2A (in atlconv.h) and the problem seems to be handled properly, but it was a worthwhile exercise.

I'm adding prefast to my bag of recommended tricks. For more insight into the use of this tool, I invite you to read Scalable Defect Detection from the Center for Software Excellence at Microsoft.

2 comments:

  1. static code analysis benefits are:
    1.can get warned of anti-patterns in the code.
    2. information about the source code.
    3. helpful in call-graphs, and class diagrams.
    Very well written

    ReplyDelete
  2. The problems with data flow and control flow analysis were specific to our implementation of the Command pattern. There is some indirection (intentional) in our implementation which confused the analyzer.

    ReplyDelete