|Home » Security
How and why they work.
We've all seen the advisories: So-and-so app has a buffer overrun which can be exploited to make your computer dance the jig. How do exploits work? Where do these buffer overruns come from? What is a buffer overrun anyway? What can be done to stop them?
For starters, most exploits assume a given hardware platform. In the World of Windows, this is most often the x86 architecture, which makes things relatively easy.
To understand how exploits work, you need to understand a bit of how processors - and in particular x86's - work.
Going back to the beginning of time (and the 4004 and 8008), Intel processors look vaguely like this:
|(E)AX||General Purpose (Accumulator)|
|(E)BX||General Purpose (Base Address)|
|(E)CX||General Purpose (Counter)|
|(E)DX||General Purpose (Data Address)|
What we're really interested in here are the last two: The stack and instruction pointers. The instruction pointer is automatically set to always point to the next instruction the processor is to carry out; the stack pointer points to the stack (duh). Let's take a minute and look at what a stack is.
The stack is a cute (human) invention which allows us to modularise our code. It makes it possible for us to organise our code into functions which can call one another.
Say we have a function called foo which we need to call now and again. Every time we call it there are a number of magical things that happen:
- We've got local variables - not to confuse things, but these normally exist on the stack - or rather the far side of it. We have to save their values somewhere, because we'll need them again when we return from the call.
- We've got lots of stuff in our processor registers - foo is going to want to use the processor registers itself, so we have to save our registers so we can restore them when the call returns.
- And lots of other small odds 'n' ends.
The stack is like a pile of trays at a cafeteria - you take one tray off the pile (the stack) and the stack 'moves up'. It's constructed a bit 'backwards' like that. When somebody calls us, it uses a bit of the stack, and when we return, that portion of the stack is restored so the caller can continue as before. The stack is also used to pass arguments to the callee - as in the example below.
Clear as mud? Ok, let's look at a typically dumb snippet of code with a buffer overrun built in.
foo(HANDLE h, LPSTR lpstr)
// do something dorky
The name of the function represents its symbolic address until it's compiled and linked, whereafter it becomes a relative address (to the starting address of the program) and loses its 'name'. The function takes two arguments, a HANDLE (which we may presume identifies a file or a window or something) and a pointer to a (character) string (LPSTR lpstr). So far so good.
The trivial code runs in a loop until the character lpstr is pointing to is zero (0), whereafter the loop breaks and foo returns.
But there's an assumption here: Namely that the character buffer lpstr points to is duly terminated where it should be with a zero (0) byte. What happens if it is not? Care to guess? Right: You get a buffer overrun.
But where is this lpstr located? Or rather, where is the character string lpstr points to located? The bad fortune comes if lpstr can overrun the stack. If it can, then the stack can be corrupted, and the values that the caller function is going to restore to the processor registers can be changed to afford an exploit of the programming error.
The character buffer lpstr points to might be something dumb like this:
If buf is not properly zero-terminated, the pointer sent to foo becomes a 'booby trap'.
A typical move at this point would be to locate an instruction in the memory of the process which pops the stack pointer into the instruction pointer. If this can be accomplished, and if the stack pointer can be overrun with the 'right' values, then the exploit can transfer control to an arbitrary piece of code inserted as part of the character string pointed to by lpstr.
Software development is an extremely complex process. It's no wonder that one normally needs an engineering degree just to start out in the business. There are thousands upon millions of things to have in mind at any one time, and the prospect of being able to dismiss user failure is most often necessary. Seeing that the code is secure is good enough; watching out for corruption of proprietary file formats is normally something that doesn't merit that much consideration.
This is all 'cool' as long as the application domain is not a sensitive one. When it comes to sensitive applications, the developers must make sure their code is also impervious to attack - for that is the nature of reality today on the net.
The MS RTF overrun is a case in point: Redmond programmers decided arbitrarily that a buffer of 35 characters to hold a single RTF token was enough, as no RTF token was longer. They didn't even think of checking their buffer was ok. But if a malicious user actually goes to disk and corrupts an RTF file, their rich edit DLL blows sky-wide. As long as the authors could assume no one would attempt to trip them up, they felt safe.
More and more the question becomes: Can we ever lean back and afford the luxury of feeling safe?
And more and more the answer is a spooky 'no'.