`About \| Forum \| Gallery \| News \| Order the XPT \| Products \| Rants \| Security \| Services \| Workshop`

Home » Workshop » Developer » Chris' IP Probe

Chris' IP Probe: Inside CIP

How it works.

Get It

Try It

The key to CIP is multithreading and thread synchronisation - something not possible in higher level programming environments. Thanks to David Cutler there's a rich selection of synchronisation objects available in 'NTx' to fit almost every need.

A synchronisation object is an abstract system object that can be used to put threads to sleep and cue them for a wake-up call later. The system is so constructed that 'sleeping' threads don't consume any CPU cycles at all - they're simply taken out of the execution queue.

Preemptive multitasking operating systems such as the Microsoft NT family control what programs are running. Older systems such as Windows 3.1 and Windows for Workgroups (and Apple's classic 'MacOS') were not such systems. They used what is commonly called 'cooperative multitasking' - which is a way of saying they had no multitasking at all. That the multitasking was - to put it bluntly - an illusion. Such systems relied on the goodwill, the cooperation, and the good coding of all running programs which were required to return control to the actual system at regular intervals. (And if they didn't return this control then you needed to perform the three finger salute - the entire system would hang.)

Preemptive multitasking systems themselves determine which programs run and how to run them. These systems set up a 'queue' of threads found in all the loaded (launched) programs. A thread is a part of an ordinary program - or more properly part of an ordinary process where a 'process' is a program (as in program file on disk) that's been loaded into memory and prepared for execution.

Multitasking systems can have multithreaded applications that can do 'several things at once'. Of course no single computer (CPU) can do two things at once but the operating system 'slices' in execution threads for a quantum or two at a time - a time slice. On the NT family a time slice is two 'quanta' and the quantum is defined in milliseconds.

The system keeps track of what threads have run, puts them generally at the back of the queue once they've had a chance to do some work, and of course saves all their data (the so called context) so as to be able to restore it the next time they run. A context comprises CPU register values, stack pointers, stack variables and their values, instruction pointers, and the like.

[A context switch - the process of saving the context of a thread getting booted out and restoring the context of the next thread to get in - is by its very nature time consuming. Good programmers will therefore try to reduce the number of context switches their code needs. Cutler's NT team did an excellent job of this with their CSRSS - client/server runtime subsystem - they buffered the GDI (graphics) calls and ultimately moved the module to the kernel.]

[The NT system scheduler has a second component that periodically looks for threads that for any reason haven't got the time slices they 'deserve' and boosts their priority so they get a chance too. This is necessary to prevent thrashing and other niceties from occurring.]

Threads can also voluntarily 'succumb' to synchronisation if they want. The wonder of Cutler's NT is that almost all 'objects' in the system exhibit synchronisation characteristics. The key API is WaitForSingleObject(). As a function argument the handle (opaque reference) to the object is given.

As soon as a thread calls WaitForSingleObject() the system puts the thread to 'sleep' - the thread is simply removed from the scheduler queue. It will be wakened when the object it is waiting for is ready ('signaled').

There are any number of dedicated synchronisation objects threads can wait for - auto-reset events, change notifications, critical sections, events, manual reset events, mutexes, and semaphores to name a few. The semaphore was new to NT (not found on Cutler's earlier VMS) because it hadn't been invented (by Edsger Dijkstra) until after VMS was introduced.

And then there's the fact that in an 'object based' (not object oriented) system such as VMS or WNT almost everything is an object - and can therefore in some or other context be used to 'synchronise' things.

File handles - block while a file is open, release waiting threads when the files are closed.
Thread handles - block while the threads are running, release waiting threads when they exit.
Process handles - block while the processes are running, release waiting threads when they exit

[And of course this is but an extremely cursory tour of the rich world of Cutler's synchronisation in WNT.]

Having said (and hopefully understood) all that it's time to look at what CIP does and how it does it. When the time comes to validate the URLs in HOSTS CIP must send a query for each and every one of the listed URLs. Which most often reach into the thousands (or tens of thousands).

In its startup code CIP creates a manual reset event and a semaphore. The manual reset event will be used to 'push' DNS queries; the semaphore will limit the number that can go out at any one time. It also creates another manual reset event (that GD might have created first) with which CIP and GD will communicate.

Theoretically one could unleash all 20,000 or 30,000 DNS query threads at once but things wouldn't generally work out too good: the system would become bogged down with all the 'bookkeeping' and - yes as it's Windows - it might even crash. CIP uses a relatively low single digit or double digit figure here which is chosen after lengthy testing for performance optimisation.

That other application - GD - is used separately to monitor all network connections. GD can therefore see what 'secondary' URLs are being accessed by their being encoded in the web pages being loaded. CIP checks for GD automatically and if GD is running will send it a message to look again at the connections.

GD in turn will look for CIP as well - and if it finds CIP will send the results of each poll 'over the fence' into 'CIP territory' where CIP can pick them up.

When GD gets the message from CIP it creates a secondary thread to get the data. Once the secondary thread has the data it 'pulses' the event shared with CIP. CIP then sees there is data ready to be added, gets the data, and adds it to its own listing. (Which is why it's recommended one do this with other than the default HOSTS file and only export there later: these lists can get very big.)

The actual data transfer is done by way of a read/write 'file mapping' area (in the system swap). CIP knows of this file mapping as well; both attempt to create it and it doesn't matter who gets there first. It's thereafter shared to pass data from GD to CIP.

The 'coxswain' of it all is in CIP. It initialises the toolbar for the run (turns the stoplight from green to red), increments the count of live threads (needed to know when it's safe to exit the application), then sits and waits on the semaphore.

When first the run begins a finite (single digit or double digit) number of threads will get through with their DNS queries. But each time a thread passes through the semaphore the semaphore's 'counter' is decremented. Threads wanting to pass through the semaphore have to wait until the counter is greater than zero. As each thread receives an answer from the DNS it gives CIP's list the answer to the query and exits - whereby the semaphore counter is incremented again and another DNS query thread can get through. The threads are created only when needed.

It's a bit like leapfrogging in a way. DNS query threads are continually created, threads continually return with the responses from the Domain Name System, thereby releasing further DNS query threads, and so forth until CIP's gone through the entire list. Despite the inordinate complexity (and the source code was used in a course on system engineering) it's blazingly fast and does its job as efficiently as your web surfing becomes.