|Home » Resources » Rants
Fatal Flaws I
Week of September 15, 2004
There's a lot of collective breath-holding going on right now, and it isn't just amongst Windows users. The Internet as a whole is holding its collective breath. Everyone is focused on XPSP2 whether they use it or not, and the reason is Windows users have caused so much damage to the Internet they love.
While people everywhere keep their gaze skyward for the first sign of the other shoe that everyone assumes will inevitably drop, it can be propitious to look a bit at the 'architecture' of Windows and get a clue that shutting out a few well-known exploits is not the end of the adventure. And one of the most fatal mistakes in that Grand Mistake known as MS Windows is the Registry.
The Registry was once heralded as a Windows technological breakthrough. Today software engineers and system administrators know better. It was thought to be a breakthrough because it circumvented issues with the previous system - the INI file system - while (of course) introducing far more lethal issues of its own.
The INI file system can be seen as an immature - and typically 'Microsoft' - version of the XML we use today - but INI files are severely limited where XML is definitely not.
Whilst data can be stored in literally any format in XML - including 'raw binary' data - INI file data can only be stored as character strings. Can you express every piece of data your program has as a character string? Of course not.
And the workarounds hacked out by Microsoft programmers to get beyond this limitation are legendary (if not infamous). One of the most amusing of all times is the storage of file search filters.
These filters generally consist of several regular expressions tied together. The trick - for the Microsoft Open and Save As dialogs - is that these expressions must be separated by NULL (zero value) bytes, and the string as a whole must be ended by two NULL bytes in a row.
Now programs like to save things like this so users can make changes and have these changes in operation the next time they use their programs again.
But character strings are delimited by zero value bytes: if you're looking for a character string, you'll only get back what was found up to the first NULL byte.
How then to store successive zero-terminated strings in an INI file? Simple - as Microsoft saw it: transform every zero byte into the logical OR symbol '|' and when you read the strings back from the INI file, do it the other way around.
Needless to say, this is basically a lot of sound and fury signifying nothing.
Saving integer values was even more fun: Microsoft, in their boundless generosity, offered an API to turn character strings into integer values (a simple call to atoi as any Unix programmer knows). What they found too difficult to offer was an API that worked the other way around: save an integer as a character string (which of course necessitated that every software house in the world had to write their own).
Given these limitations, it was no wonder people welcomed the Registry. Here you could store data in any format you wanted - they even had a special format for multiple zero-terminated strings with two zero bytes on the end.
But the Registry introduced several new issues of its own. It may be possible that Microsoft did not realise even that late in the game that binary data could be stored in text form as with XML - after all, we're talking about Microsoft programmers and designers here - so APIs to access the Registry were all they thought they could have.
What a disastrous mistake.
Following are some of the worst features of the Windows Registry. They affect every Windows computer on every kitchen table in the world - yes, even yours.
The 'classes root' key started back in the days of Windows 3 when the Registry - known then as the 'Registration Database' - was an 'editable' entity where changes could or could not be saved to disk at the end of an editing session.
[Today, as all Windows users know, all changes are saved to disk immediately, resulting in untold damage as innocent mistakes are made.]
The classes root key was part of a greater architectural design which would emerge first with David Cutler's NT and later with Windows 95. The name itself refers to where the so-called 'classes' have their 'root key'. The key is not a top level key itself, but merely an alias (shortcut) into another key with a lengthier path:
What's important here is to note the top level of this path: it pertains to the 'local machine'. In Windows Registry thinking, every setting on the box is either user-specific or 'machine-specific'. Machine specific settings of course pertain to all users on the same machine.
Anything under the 'classes root' key is thus for the entire machine and not for any one particular user.
One of the most important uses of the classes root key is to connect file extensions with file types and programs to edit files of those types. Microsoft make what many consider an unnecessary and problematic distinction between a file's extension and the file type it belongs to.
For example, files with the extensions 'LOG' and 'TXT' might both be considered text files and default to Notepad as an editor; another way of thinking would be to simply connect both these extensions to Notepad in the Registry - which would in fact be a far simpler system, a system most other OS vendors use, but it would be too simple and too obvious for the likes of the MS rocket scientists.
Anytime you double click a file to have it open, the Windows shell has to get the file's extension from the file name it's sent, search for that file extension in the Registry's classes root key, get the file type associated with that file extension, go back to the top of the classes root key and search now for the file type, then get the name and path of the program that can do the editing.
WHOOSH. That's a lot of disk crunching to open a single file - and if Microsoft had eliminated the 'file type' man in the middle, it would take only half as long.
(Note: this is precisely what the 'explorers' have to go through to depict icons for listed files: icons are never stored with files; they're most often stored inside the programs that edit them. First you get the file name; then you get the extension out of that; then you go into the Registry at the classes root key and search for the extension; then you find a 'file type' for that extension, go back to the top of the classes root key and start all over again, searching now for the file type; then you find a connection to a program to edit files of that type; then you look for info on where the icon is stored and hopefully get a full path to a file where the icon is located (this may or may not be inside the program itself - it can be a standalone icon file but most often is not: the MS Windows shell contains thousands of icons in use all the time - in a single file - and Windows on the whole has several files that work as icon depositories in this way).
Once you have the path to the file with the icon, you note the index number given in the Registry with this path: if it is zero or positive, it is an absolute index into the icon resources in the file; if it is negative, it is a numerical 'name' for the icon where the number used will be positive.
Now it's time to access the file in question. The shell opens the file, reads the 'executable headers' to determine where the 'resources' are found, skips to that offset in the file and now reads the 'resource headers' which enumerate what types of resources are found at what offsets.
The shell now looks for the icon resources. If it's been given an index, it scoots to the icon resource at the given index and loads it into memory. If it's been given an absolute numerical name, it converts the value from negative to positive (by multiplying by -1) and then rummages through all the icon resources until it finds one with the same 'numerical name' and then loads it into memory.
And at this point the file's icon can be displayed in the 'explorers' and the 'explorers' can finally proceed to listing the next file.
What's important from the point of view of a user is that the associations do not and cannot pertain to that particular user's preferences: they must and always will pertain to how the computer is set up as a whole. This might or might not be to the user's liking; there can be several users on the same machine, all with different ideas of how things are supposed to work, and as often as they return to their boxes and make their small adjustments, just as fast are they throwing a spanner in the works for everyone else working there.
Somewhere below the classes root key is the infamous 'CLSID' key. This is where data on all the 'modules' on the system are stored.
This must come as a shock to an ingrained Unix user, as Unix definitely does not suffer performance deficiencies in comparison to Windows, and yet Unix does not need any of this clutter and waste.
The CLSID idea that grows like the Blob&trade inside Windows is actually based on what was proposed for an operating system that predates Unix: the GE/MIT/Bell Labs project known as 'Multics'.
The idea is to have modules spread about on a system that are capable of broadcasting their existence and offering services to other modules that are interested - all in realtime of course, and 'all in RAM'.
The method of broadcasting is the Registry; the method of communication is Microsoft's COM and OLE layers; clients search in the Registry for services, contact the providers they're interested in, and so forth.
It was an archaic and mislead idea nearly forty years ago; now it's definitely got out of hand. Third party vendors all over the place are building single standalone applications that can't find their butt from a hole in the ground, with development teams where the work is laboriously divided into sub-projects to an almost perverse degree.
Products ship with tens or hundreds of modules, all coming from the same supplier, and yet no module - and no part of the development team - knows anything about any of the others.
Which is where poor Windows users get the horrendous idea that software has to be 'installed' - when any developer of merit still knows to this day that software really only has to be 'run'. The complications in the overly regimented development team, with a blessing from 'Barney Stairstep', wreak havoc on user's systems, with as usual no thought for the consequences - for as Barney himself would undoubtedly opine, hardware and reality checks and considerations are not part of good program design.
And he should know.
Development teams are broken down into such small and irresponsible modules that no one really cares how the final product turns out - they're not involved enough with it.
And who will judge if the final product is good or not? Suits of course. Marketing suits. Who know absolutely nothing about computers and are dumb and dumber at best. And so the innocent unwitting Windows user gets 'hammered'.
And so how do all these isolated parts of the same program communicate with one another? By using the Registry of course.
Each is defined with a 128-bit 'CLSID' expressed in text form. Each takes up its ton or two of data in the Registry. When the program launches, it starts looking around for all the parts it needs to accomplish what it's assigned to do; these parts might very well be in the same physical file or be neighbours on disk so close it's uncomfortable, but for software written in this dumb fashion it won't matter: the software has to 'install' a megaton of junk in the Registry just so it can find its own butt-hole.
Look into the CLSID key in a Registry to see just how big it is - better yet, do an 'expand all' on the key so you see everything at once (note: this can take a long time to complete).
And CLSID is not the only key of this sort: 'Interface' is another good one. The geniuses in Redmond are coming up with new keys to clobber you with all the time.
Which leads to the next issue with the Registry.
Most Windows systems are infected. Earthlink estimates the average Windows user has nearly thirty trojans onboard and (of course) not knowing it. How do these programs get on your disk - and more importantly, how can they stay there?
They get on disk because people are stupid; they stay there because they have a good ally: the Registry.
When you open Registry keys like CLSID and Interface - even when you open the classes root key - you're likely to feel vertigo - to feel lost and overwhelmed. Even though it looks a bit like a file system, it's not, and there's an unbelievable amount of data there.
And admit it: you don't know what most of it means. What's 'Apartment'? What is all that stuff? No, you don't know, and because of that you wouldn't know if something illicit got in there either. You lack the wherewithal to distinguish between good and bad data in your Registry. It's just too impossible a task.
So get a clue: what's the first place malware is going to try to hide? You got it.
This is in fact where most shareware hides: the tricks are legion, and it might be fun to enumerate them, but it wouldn't serve any purpose except to help people circumvent shareware agreements. But it's obvious to even the most clueless that shareware that runs out has a way of knowing it's supposed to run out, and that way is hidden somewhere on your disk - most likely in your Registry.
It's simply not polite to take an invitation into someone's home and then muck about like that. (Radsoft software has never been 'shareware' and never will be.) You download a program for testing, and the first thing that happens is the 'installer' or whoever sneaks off to a preselected 'cranny' somewhere and puts in a few bits. How many crannies are stuffed with a few bits on your machine?
Get the picture? They can put the stuff anywhere. The idea is to not get 'found out'. They can - and do - disguise themselves as authentic system components. Use of fake CLSIDs is very common, for example. Create something totally bogus, something that will fool every user hopefully, something that will not cause a conflict in the system. Put all the dirty secrets there.
They're doing it all the time - not only your lily white shareware applications, but the truly evil spyware trojan applications: they need the Registry to hide out - they have to be in line to start up the next time your computer starts up or the game is lost. If the Registry weren't such a jungle they'd be easier to spot. Fewer people would lose their identities and life's savings; fewer people would be turned into spam relays; and so forth. With a Registry that is virtually impregnable for ordinary users, the odds are in the criminals' favour all the way.
The Windows Registry is the system's aorta. Without it the system cannot boot - period. It needs this Registry intact and uncorrupt. If anything gets out of whack the system as a whole collapses.
Just ask any of the many Windows users who've seen this happen with their own eyes. The cause can be a faulty third party software product, or a 'glitch' in the system or the hard drive itself, but one byte pushed out of order and the whole house of cards comes tumbling down. One extra byte - one byte too many or one byte too few, pushing all the other bytes out of whack by so little. Data takes on an entirely new meaning and programs crash, drivers can't load, file paths make no sense, and so forth.
It's all in one place and it's all binary (read: illegible). If it goes, the whole system goes with it, and you can't open it in Notepad and repair it.
Insult + Injury = ?
As if the above were not bad enough - adding proverbial insult to proverbial injury - both Microsoft and third party developers continue to abuse your system with some very bad engineering practices, mostly due to pure laziness.
The advent of the Registry meant that applications could store their user-specific settings in a clean and efficient fashion: user-specific data is of course stored a whole lot differently inside a program than it is represented in a settings dialog - programs use a 'binary' format internally.
It is relatively no bother to organise this data into convenient memory blocks and to perform all settings I/O as single Registry access operations (all Radsoft XPT programs do this). It makes programs faster to load and increases efficiency to a great degree.
Coming from the era of the INI file however means certain snippets of code have to be written: in the old days only one value - character string or integer - could be stored for any single entry. It was not possible to save all user-specific data in one big binary block.
Programs from that era were littered with hundreds of lines of code just to get the most trivial data out of an INI file and put it to use in a program and to take the data at the end of a session and save it back in an INI file. Non-trivial programs had dozens if not hundreds of INI file access calls.
When David Cutler introduced Windows NT - the forerunner to Windows XP - and also established use of the Registry, his team created a number of 'convenience' calls functioning as 'wrappers' around traditional INI file access calls.
Working through the Registry, NT and its successors check, upon receipt of an INI file call, if the INI file in question is actually 'wrapped' into the Registry; if so, the INI file call does not go to disk - the INI file per se does not exist - it accesses instead a designated Registry key.
This was all fine and good at the time, as the product (NT) had to ship and as to do otherwise would have meant a severe delay as countless programs would have to be rewritten first.
It was all good and fine at the time but it is not good today.
Rewriting this kind of code, as mentioned above, takes a little time - yet any conscientious software house will do it. The trouble is too few software houses are really conscientious and not all of them know how Microsoft are undermining their best efforts anyway.
Microsoft's own development environment automatically makes use of INI file access calls which may or may not wrap inside David Cutler's Registry: the point is that a lot of junky code is permeating itself around your system without your knowledge - no matter you approve or not. Third party developers using Microsoft's own tools - which are fairly standard on the platform - get unwittingly caught in this trick.
Each and every Registry key (the stuff on the left that looks like folders) has an estimated overhead of 260 bytes in addition to its data. Everything in the Registry - whether it need to be or not - is in Unicode. This means that a key with the name 'foo-bar' will consume a minimum of 275 bytes - and that's just for starters.
Values as they're called - the 'files' you see on the right - have an overhead too. Everything has an overhead naturally. The issue is that abuse of the Registry leads to bloated and sluggish systems. And most software houses don't really care. What's worse, Microsoft is the leader of the pack, and they're definitely too lazy to do anything about it: they're more interested in cornering the markets they don't yet have and will rarely if at all devote any time to products in markets they already dominate - even if said products are scandalously substandard.
Every Windows system - whether 95, 98, 98SE, ME, NT, 2K, or XP - grinds over time irrevocably to a halt. The reason it does this is the Registry gets 'maxed out'. A maxed out Registry can corrupt an entire system - Windows 9x can fail to work - but most often it just means a system becomes more and more sluggish over time.
Not to discount the effect of having three dozen wild and wacky trojans on a system of course, but these symptoms of Windows have been known for a long time. The cause is the Registry: it gets too big. Especially the 'classes root' key.
The classes root key can only be read in one direction: from file extension to file editor. You cannot track from a given program - if you find it - to all possible file extensions. Not with Registry navigation.
The Windows shell still has to do this however, and the way to do this is to build up a dynamic (read: in RAM) database on system boot with all the pertinent data in a flat hierarchy. Naturally this takes a lot of CPU and disk activity.
What really slows the system down however is how the shell feels it incumbent to follow each and every file operation you perform and adjust the Registry accordingly - 'on the spot'. You can change the location of file handlers on disk with your Registry editor open and watch how the paths to these handlers change before your very eyes. It takes a lot of memory and a lot of CPU to accomplish this on Windows - perhaps not at all as much on other systems, but definitely here. It's a horrible mess.
Which means there is an awful lot of disk crunching going on with the most innocent of file operations you perform. You're also wont to see sudden unexpected message boxes warning you how deleting a file might 'impact' the performance of other stuff on your drive - and you're excused if you once blurted out to yourself: 'how could Windows know this?' But now at any rate you know.
While ridding oneself of the immediate dangers - of identity theft, system corruption, and the like - are all admirable steps in improving one's safety online, the issues with the patented way of 'not thinking' at Microsoft mean one's choice of operating system has to come sooner or later under review. It's time consumers stop playing the unwitting victims and start taking matters into their own hands.