About | Buy Stuff | News | Products | Rants | Search | Security
Home » Resources » Software » Reviews

Spotlight: File Scanners

It's a programming exercise as old as the hills, but the variance in its implementation can be dramatic.

From The Practice of Programming, Kernighan/Pike:

/* strings: extract printable strings from stream */
void strings(char *name, FILE *fin)
{
    int c, i;
    char buf[BUFSIZ];

    do {    /* once for each string */
        for (i = 0; (c = getchar(fin)) != EOF; ) {
            if (!isprint(c))
                break;
            buf[i++] = c;
            if (i >= BUFSIZ)
                break;
        }
        if (i >= MINLEN) /* print if long enough */
            printf('%s: %.*s\n', name, i, buf);
    } while (c != EOF);
}

It's as easy as that. Sure. But at any rate, that's the starting point. 'Strings' is a programming exercise as old as the hills. It goes through a number of revisions in the K/P book, and in the world of GUIs several further considerations might be important. Today there are a number of file scanners available, and of dramatically varying quality. Here's an in-depth look at some of the best known of them.

BinaryTextScan / NinjaSoft

A well known file scanner, but 'oh please': just run this app on itself. Check out the string table(s) - the level of redundancy here is staggering. Basically, an application of this sort should consume no more than 30 - 40KB on disk tops. This application weighs in at 173KB. It's bloat, and where there's bloat there's bugs.

The version information is also rather enlightening:

Comments The NetNinja Roam The World!
Copyright Copyright (C) 1997 by Enigma <enigma@netninja.com>
Description   BinaryTextScan MFC Application

BinaryTextScan is built, in other words, on a MS App Wizard application skeleton - by programmers who knew enough to find the version information resource to put their little 'tags' in there, but precious little else.

The fixed file info claims this is version 1.00.0.1, while the string file info claims this is version '1, 2, 0, 1'. 'Nuff said? No.

  • The only configuration option is to set the minimum text length.

  • The '----' (<-- insert your own expletive) that wrote this program included two icons in the image. Now get this: the first has images at both 32x32 and 16x16; the second has an image at 16x16; the two 16x16 images are identical.

  • The classic MFC bloat dialog 30721 ('New') is included.

  • The image contains a special cursor for context sensitive help (presumably for use in figuring out what 'Minimum Text Length' means).

  • The image contains all the classic MFC blooper bitmaps 26567, 30994, 30995, and 30996 for Ctl3d, and its own text scan verifies that it is linking statically with the Ctl3d library, which has been outdated for more than five years. (For a discussion of Ctl3d, see elsewhere at this site.) This by itself represents over 6.5KB of pure junk.

All of the above not only represent a substantial bloat but a good indication that the author(s) of the program don't really have a clue.

Summary? 'Stay away'.

TextScan / AnalogX

Ah - a new contender from the legendary AnalogX! This is surely to be a hit - or?

Some of Anal's stuff is good - this particular application was evidently thrown together in such a hurry that nothing is missing but nothing will work right either. (What got Anal into such a state that he blew it with this blooper release?)

It's bigger than BinaryTextScan - and BinaryTextScan is in a sorry state - so the question for the umpteenth time with these Anal EXEs must be:

What's going on under the bonnet?

(As we know, Anal tries to obfuscate by using compression technology [EmuCore] which more often than not doesn't save any disk space anyway - so what is he trying to hide? And why would anyone feel the need to?)

This software spotlight was envisioned as a short piece, but the more TextScan shows what it does (instead of the right thing) and what it ostensibly cannot do... At any rate, here's the worst of it:

  • The application window is a drag-drop client. In other words, you're supposed to be able to drop files on it for scanning (what you would expect to be able to do).

    And the application window has 'announced' to the operating system that it will accept files dropped on it.

    So try it: try dropping something on TextScan and see what happens:

    (Nothing.)

    (Yep, that really hurts.)

  • Strings are limited to 1,024 characters in length. You think you can specify longer strings - but just close the dialog and open it again and see what the 'artist' has done.

    But hold on: why then does the dialog allow entries of up to five characters (99,999 bytes) in length? It's at this point the vertigo starts setting in - have we been hoodwinked all along?

  • There is no such thing as 'Unichar' - it's 'Unicode': Xerox, Motorola, DEC, IBM, Apple, and about a zillion other companies support this Xerox standard and it's called 'Unicode'.

  • Saving reports is good - and using CSV is nice - but other formats should be available. As CSV is already implemented, it shouldn't be difficult to add TSV for example.

    But wait - what's this? Is this real CSV we have here or not? If it is, where are the double quotes? What happens if a string contains a comma?

    Dare we ask?

  • There is no way of configuring the application in any non-trivial way and no explanation of what the nearly non-existent configuration options do either.
  • The program logic, in place here and there throughout the application, that is supposed to transport you to the AnalogX web site does not work on anything but you guess what platform.
  • The terminology used is ever so pretentious: 'Process File'; 'Status: Idle...' - is this a PC program or an underground bunker at NORAD?

  • The Search dialog cannot be dismissed with the mouse - you can't initiate a search and you can't dismiss the dialog with it either - you have to hit Enter or Esc(ape) on the keyboard to get anywhere.

    (Anal is trying to be cute here, using a dialog without a caption bar and a window menu, but he's only coming up lame.)

  • File offsets are given in decimal.

  • You can't re-scan a file; you have to go through the process of clicking 'Process File' and browsing to it all over again.

  • The columns only sort in one direction and TextScan really takes its time doing it too.

  • TextScan doesn't find the half of it in the files it scans. How this happens is not known. When comparing with the other apps here TextScan turns up only about 1/10 (one tenth) as much information.

  • Strings ending in '.dll' are helpfully marked as being of type 'DLL' (this must have been very difficult to implement).

  • TextScan lists the size of the strings too - like we really need a column wasted on the string length of a string here. The string length is totally irrelevant, so much so that the suspicion must be that the author, in a panic and a rush, felt the application was impoverished without more columns in its display, and didn't have the wherewithal to add anything more useful.

  • Controls don't turn off during a run, leaving a gaping hole open for a really nasty crash.

  • TextScan sits on the tray, as usual its tray context menu is conspicuous in its absence, and the application jumps up already at a single click - which is about as user unfriendly and far away from SAA/CUA/CUI as you can get.

  • Scanning a file of non-trivial size (5MB or more) will race the CPU and take more than ten times as long as is necessary. (CPU races are dangerous too.)

  • At the expected 200KB for an AnalogX app it's bloated of course - it's using a Watcom IDE and static runtime link from 1995 with some obtuse 4GL code generation bulk thrown in for good measure.

    (Anal normally knows how to please, but, sorry to say, is a long way from being a real programmer.)

  • Wrapping a program such as this in an install and uninstall borders on terminal cluelessness.

  • The uninstall program does not delete Registry information.

Summary? 'Stay away'. Wait and see if Anal can get his act together, get the bugs and faulty design stuff and hype out and put some very necessary stuff in (but don't expect the bloat to ever disappear).

This application was not written to satisfy a need - either the author's own or anyone else's. It should not have been released as is.

BinText / Robin Keir Software

A relative newcomer (not as new as TextScan though). At 35KB it's easily the leanest and meanest of the lot, and the most versatile, useful, and reliable too.

In fact, this is a truly amazing app. And the more you use it, the more you wonder if this isn't, in some twisted pre-teen way, the impetus for TextScan, for the latter tries to copy the style of BinText all the way through - and fails miserably at it at every step of the way. But enough about TextScan and Anal (sheesh) - on to BinText, a pure end user pleasure:

  • BinText is fast - it can knock off somewhere between three and four megabytes per second.

  • BinText tells you both the file offset and the memory offset of each string found - in case you didn't know, this is no mean feat. Images have different file and memory alignment values, and juggling the two is a very nice (and not so easy) finesse.

  • Saving reports is 'intuitive', the reports themselves 'WYSIWYG'. When saving from the advanced view, BinText formats everything in columns with headers (and the headers are conveniently repeated down throughout the document - very nice touch).

  • Check out the configuration possibilities - this program lets you do anything imaginable (and unimaginable too for that matter) - which makes you wonder: with all these possibilities and program switches roaming around, how can BinText run so '----' (<-- laudatory expletive here) fast?

  • BinText distinguishes between Unicode and resource strings. The latter are Unicode too, but the author of BinText really knows what he's doing and where he's at in a file that's being scanned - and the resource ID field follows right along. Again, this is an astounding juggling act, and it's this kind of subtlety that really makes the application useful for its intended audience.

  • No install or uninstall necessary - just run it.

  • No sneaky stuff going on either. No Registry or file I/O pranks like with Anal's stuff. 'You're safe.'

Summary? 'Get it.' It's hard to conceive of a program ever out-performing this one, unless the author himself writes it.

Odds are K/P will be getting it as well.

About | Buy | News | Products | Rants | Search | Security
Copyright © Radsoft. All rights reserved.