Screen snarfs
--#--
The Web is a wonderful place. All those treasures of information
like ripe apples hanging just inches away from your grasp. You just
need a browser. Or, do you?
How hard is it to grab a particular nugget of data off a Web page
without going to the trouble of launching a browser, typing in the
URL, clicking down a layer or two, etc.? Surprisingly, it's really
quite simple.
An example: Network Solutions maintains a whois server for
querying the InterNIC database of domain names and owners.
(
http://www.networksolutions.com/cgi-bin/whois/whois.)
It's a great tool. In fact, it's a better tool than many people
know. Let's see if we can streamline the way it works, and deliver
an informative and useful report.
First of all, using the InterNIC whois to see if www.coolbabes.com is
taken (it is) is a very trivial implementation of a powerful tool. Why
not use it to see all the domains registered to General Motors?
When you get the list (large), each domain is hyper-linked, so you can
click on them to get each one's entire database entry.
Or you can write a screen snarf that asks for a list of all
GM's registered domain names and then uses the list to collect all
GM's InterNIC database records.
You could kick such a scraper off through a CGI run by a Web
form. Or, you could run it as a daily cron. You might even implement
it as a Java application your secretary runs on his Windows desktop.
I'll show you how to do all three.
I'll even show you how to do it from behind a proxy firewall.