Rant of the Week


	`About \| Buy Stuff \| News \| Products \| Rants \| Search \| Security`

Home » Resources » Rants

Anatomy of a Snippet

Week of February 17, 2004

It took but two days for someone to find a vulnerability in the Microsoft code.

It's not yet to the point where it can be exploited in the wild, but it's very close. And the beauty of it is that anyone can have fun not by sending spam or propagating worms, but by simply putting up an innocent web page and waiting for IE users to drop by.

The error in the code that produces this vulnerability is typical of Microsoft. Windows is a world - and the layman can never have known this - where data types are being 'reformatted' all the time on the fly. As this 'tutorial' proceeds, the message will become clear.

The snippet is also, despite its brevity, a typical example of Redmond coding. The devil, as they say, is in the details. Read on.

The Snippet

It's the following snippet that was used to create the exploit. Numbers have been added to the code lines to make it easier for the reader to follow along.

 1  // Before we read the bits, seek to the correct location in the file
 2  while (_bmfh.bfOffBits > (unsigned)cbRead)
 3  {
 4      BYTE abDummy[1024];
 5      int cbSkip;
 6
 7      cbSkip = _bmfh.bfOffBits - cbRead;
 8
 9      if (cbSkip > 1024)
10          cbSkip = 1024;
11
12      if (!Read(abDummy, cbSkip))
13          goto Cleanup;
14
15      cbRead += cbSkip;
16  }

The first thing we have to do is note what this snippet is trying to do. It's from the file:

win2k/private/inet/mshtml/src/site/download/imgbmp.cxx

As such, it's part of the source to the ubiquitous MSHTML.DLL ('Microsoft HTML') library which runs both Internet Explorer and a lot of other things out there.

Further, the source file is descriptively named 'imgbmp.cxx'. The 'cxx' extension means it's a C++ file, and the name itself - 'imgbmp' - gives the game away: it's dealing with Microsoft BMP images for Internet Explorer and the Internet in general.

'Why worry about BMP images at all?' the astute Internet denizen might ask. 'After all, BMP is not a recognised image format for use on the web!'

Why indeed! Suffice it to say this is just the type of dumb shenanigans Microsoft are always up to. Internet Explorer has been capable, from the beginning, of rendering BMP bitmap images - no other browser out there can or would dream of touching them. The net result is a lot of clueless 'AOL users' who make web pages no one else can see. Lovely - and typical Microsoft.

The task of this snippet is to get the images for rendering and pass them on to the client - most often Internet Explorer. The way this is done is abysmal.

Line 1 is very confusing: there is no 'seeking' going on in the snippet whatsoever. What the author is talking about is skipping through a file to a particular location, and not reading from the beginning.

But there's no 'skipping' going on - the function which reads in the actual bitmap data is already aware on its own of the 'sequential' offset pointer, and doesn't need any help in this regard.

Line 2 shows what's up: every Microsoft BMP image has a 'header' with information about the image itself. The variable _bmfh.bfOffBits refers to a field inside the BMP header (_bmfh) called bfOffBits that gives the size of the image in bytes.

Instructive here is that bfOffBits is an unsigned value. (And this is coincidentally where the exploit comes from.) To fully appreciate what's at stake here, it's necessary to understand how computers work and think. But first, let's look at the official Microsoft documentation for BMP header information.

And make a note that 'biSizeImage' is declared as a 'DWORD' - we'll get back to that later.

BITMAPINFOHEADER
(found at http://msdn.microsoft.com/library/en-us/gdi/bitmaps_1rw2.asp>

The BITMAPINFOHEADER structure contains information about the dimensions and color format of a DIB.

typedef struct tagBITMAPINFOHEADER{
  DWORD  biSize;
  LONG   biWidth;
  LONG   biHeight;
  WORD   biPlanes;
  WORD   biBitCount;
  DWORD  biCompression;
  DWORD  biSizeImage;
  LONG   biXPelsPerMeter;
  LONG   biYPelsPerMeter;
  DWORD  biClrUsed;
  DWORD  biClrImportant;
} BITMAPINFOHEADER, *PBITMAPINFOHEADER;

Binary Representation

Back on track:

Computers 'think' in binary, as everyone has heard by now. On Microsoft's native Win32 platform, they also use 32-bit integers. A bit - a binary digit - can be either 0 or 1. That's it. The gamut of 32-bit values goes from zero:

00000000000000000000000000000000

To somewhere in excess of four billion - 4,294,967,295:

11111111111111111111111111111111

But that gives us access only to positive numbers. If the computer needs to think in terms of negative numbers as well, the leftmost binary digit becomes the 'sign': 1 for negative, 0 for positive.

Thus the following:

10000000000000000000000000000000

As an unsigned number is huge, but as a signed number is a negative value.

Back to the Snippet

The entire snippet of code is enclosed in a 'while loop': at each iteration, a condition is tested, and if that condition is true, then the code within the curly braces (lines 3 and 16) gets executed. (In other words, the code reads 'while this condition is true, keep on processing'.)

The 'while loop' and its condition are on line 2:

while (_bmfh.bfOffBits > (unsigned)cbRead)

Note that this snippet does not represent a complete function: certain variables, such as cbRead, are obviously declared elsewhere. And (knock on wood) we're relying on cbRead being set to zero before our loop begins.

cbRead is 'Microsoft jargon' for 'count of bytes read'. It's a variable name, and incidentally contravenes Microsoft's own rules for naming variables, but let's not get hung up there yet. cbRead is going to keep tabs on just how much of the bitmap we've read in.

The test - the condition - is whether the entire bitmap has yet been read into memory or not. If it has not, we keep going.

But instructive is that the Microsoft brainiac writing this snippet has evidently declared cbRead as a signed integer, meaning he has to force the 'data type' in the test to 'unsigned' to see correctly how things stand:

(unsigned)cbRead

Already we smell the stench of nonsense. It's a lot easier to write:

int cbRead = 0;

Than it is to write:

unsigned int cbRead = 0;

But fact of the matter is Microsoft have their own data type, the DWORD, which is most often used in this context, and as Microsoft's own documentation for their own formats seems smart enough to understand this (see above), it's a wonder if the IE brainiacs could not. It would be very surprising to find that bfOffBits itself was not actually defined as a DWORD.

All the programmer had to write was (if not the 'unsigned' variant above):

DWORD cbRead = 0;

As all Microsoft disk and Registry I/O transfer counts are measured in DWORDs, why not be consistent here as well? After all, they're the same thing. But the Microsoft platforms are notorious for their botching of data types, and we can't linger too long over crimes we're already more than aware of.

Blowing the Stack

On line 4 our Albert Einstein derivative starts declaring his local 'stack' variables.

This is typically where things 'blow' if an exploit is possible.

The 'stack' - set to a default 1 MB on Win32 programs - is a memory area where things can be put aside and fetched later on.

If we have a function in our code called foo that has to call another function called bar, then the pertinent current data for foo is 'pushed' onto the stack, as it's called, prior to giving control to bar, and when bar is finished, control goes back to foo, which can then 'pop' the data it saved on the stack, and so on.

The trick of course is that this stack data also controls program flow, and has addresses the program needs to know where to continue execution. Stack variables are temporary variables, and they are stored on the 'far side' of the stack; if any of the local code should overrun these variables, the code can corrupt the crucial stack data waiting to be used, and the exploit is thereby a fait accompli.

The first stack variable is abDummy. It's a buffer area of 1,024 bytes in size.

BYTE abDummy[1024];

The next stack variable is cbSkip (line 5).

Using a lugubrious programming technique only allowed in the most brainless institutions (and it does give the semblance of writing more code than one actually has), our Max von Planck candidate waits until line 7 to initialise cbSkip:

cbSkip = _bmfh.bfOffBits - cbRead;

When he should have initialised it on line 5 in one fell swoop:

int cbSkip = _bmfh.bfOffBits - cbRead;

Of course, he might have had to write more actual substantial code to avoid the prying eyes of a lines of code counting BALLMER, but that's not our concern.

Then again, as we shall see shortly, this 'int' thing is coming back to haunt, and the line should actually read:

DWORD cbSkip = _bmfh.bfOffBits - cbRead;

But that's a tin of worms that can wait until later.

We now have two stack variables, and we've tested our condition, and presumably there's more to read of this bitmap file, so off we go. And the next thing we do is see how much of the file remains to be read (line 9):

if (cbSkip > 1024)

What does this mean? It's asking if the count of bytes that still have to be read is greater than 1,024. That much a child can see. But why is it asking that?

It's referring to the declaration of abDummy on line 4. Now the author of this code gem knows that he has a buffer of exactly 1,024 bytes for reading in pieces of the bitmap, but who else knows it? Fact of the matter is that this is the first sign (in this snippet at least) that things are a bit screwy in Redmond. Yes, the programmer can get away with it in this particular instance, but the 'programming style' indicates that 'brains' are not in high priority in the Pacific Northwest.

The programmer should have had the ingrained discipline to write:

if (cbSkip > sizeof(abDummy))

'sizeof' is a compiler construct that computes - at compile time, so there's no overhead whatsoever at runtime - the size of whatever you toss at it. So in this case, sizeof(abDummy) is very truly 1,024.

But although the two constructs equate, there's a very big difference in quality.

If this snippet grows, and references to 1,024 get farther away from his local declarations, he won't get in trouble. In fact, use of the literal 1,024 also violates one of Brian W Kernighan's golden rules, namely to never use a 'magic number' in code. What the programmer again should have done is the following:

#define BUFSIZE    (1024) // no magic numbers please

    while (_bmfh.bfOffBits > (unsigned)cbRead)
    {
        BYTE abDummy[BUFSIZE];

(And note that line 9 does not and should not change - it's still directly tied to the declaration of abDummy.)

Our Nobel Prize candidate now goes on line 10 to set the maximum value of cbSkip, and again this is wrong.

What he should have written is:

cbSkip = sizeof(abDummy);

And now we begin to see what he's after: he doesn't want to read, so to speak, 'beyond the end of file', or more likely in this case, beyond the end of the memory buffer where the bitmap has been loaded (which would cause another crash natch).

So now he has - 'supposedly' - a valid count in cbSkip and can proceed to get more of the file (lines 12 and 13).

if (!Read(abDummy, cbSkip))
    goto Cleanup;

'Read' is obviously a function beyond the scope of our snippet and found elsewhere; what is instructive is that if it returns a zero value ('!'), the code is to 'jump' to the label 'Cleanup', also found elsewhere - in effect breaking our 'while loop'.

What's also instructive is the primitive nature of this Read function: it can only tell us if the read was successful or not; it cannot tell us exactly how many bytes were read. Perhaps this is overkill, but it would require perhaps only half a dozen bytes in the final executable to do it right.

All official Microsoft APIs for I/O always return both a 'boolean' yes/no true/false value, as well as a count of bytes transferred. In the case of RegQueryValueEx below, the return value of type 'LONG' (another Microsoft bastardisation), actually gives you an error code if something's gone wrong, and 0 if it's OK - which amounts to the same thing. Both functions report both success or failure and the number of bytes actually transferred - as DWORDs of course, and not signed integers.

('LPDWORD' means 'long pointer to DWORD': the caller supplies the address to a DWORD of its own, and if the callee returns a 'success' value, this DWORD is set to the actual number of bytes transferred.)

ReadFile
(found at http://msdn.microsoft.com/library/en-us/fileio/base/readfile.asp)

The ReadFile function reads data from a file, starting at the position indicated by the file pointer. This function is designed for both synchronous and asynchronous operation.

BOOL ReadFile(
  HANDLE hFile,
  LPVOID lpBuffer,
  DWORD nNumberOfBytesToRead,
  LPDWORD lpNumberOfBytesRead,
  LPOVERLAPPED lpOverlapped
);

RegQueryValueEx
(found at http://msdn.microsoft.com/library/en-us/sysinfo/base/regqueryvalueex.asp)

The RegQueryValueEx function retrieves the type and data for a specified value name associated with an open registry key.

LONG RegQueryValueEx(
  HKEY hKey,
  LPCTSTR lpValueName,
  LPDWORD lpReserved,
  LPDWORD lpType,
  LPBYTE lpData,
  LPDWORD lpcbData
);

Finally, at line 15, our bedeviled programmer 'ups' the value of cbRead - the count of bytes read - by the value in cbSkip - without knowing, as stated before, if the function Read could read it all (all cbSkip bytes) or not.

The potential for snags, bugs, crashes, vulnerabilities, exploits in this stupid little snippet of code is overwhelming.

Snippet Cleaned

There is a correct way to do everything. Not that Microsoft will ever know this, or care - but there is.

The snippet should have been written as follows.

#define BUFSIZE    (1024)

    BYTE b[BUFSIZE]; DWORD cbRead, cbSkip, dw;

    for (cbRead = 0; cbRead < _bmfh.bfOffBits; cbRead += dw) {
        if ((cbSkip = _bmfh.bfOffBits - cbRead) > sizeof(b))
            cbSkip = sizeof(b);
        if (!Read(b, cbSkip, &dw) || dw != cbSkip)
            goto Cleanup;
    }

The Exploit

The exploit was possible because of the way signed integers were used instead of unsigned integers (or DWORDs). By creating an embedded bitmap with a bfOffBits value the snippet interpreted as a negative number, the exploit broke the code in the call to Read, overran the buffer, and corrupted the stack.