Snarfing up the pieces

 
Gadgets

Authentication
Crypto
ENV
HTTP
Regex
Regex 2
Robots
Snarfs
SSL
Stepper

Since we're going to be scraping up a whole bunch of HTML, and we really only want specific nuggets of information embedded in that HTML, we need a tool that is very adept at parsing and winnowing strings of data. Perl is the tool of choice. (We'll see later how Java can be made to work nearly as well.)

The fortunate thing about Perl is that it has so many good people writing useful tools and sharing those tools with, well, with everybody. Free.

An essential tool for Perl screen-scraping is LWP (libwww-perl) by Gisle Aas and Martijn Koster. You can find it (and dozens of other free Perl modules) at http://www.perl.com/CPAN/modules/by-module/LWP.

Get LWP installed (and Perl, too, if you haven't already) and you're ready to start.

<< Back  Next >>






Home | Gadgets | Code | Links | Reads | Contact

Copyright © 1999, 2001, 2002 by John H. Byrd
All rights reserved.