--#--
Proxy fight
--#--
If you're behind a proxied firewall and can't make a direct connection
to the site you're trying to scrape, you'll need to tell your user
agent about it.
First, look in your regular Web browser to find your proxy information.
In current versions of Netscape Navigator choose
Edit/Preferences/Advanced/Proxies and click the View button next to
Manual proxy configuration. You should see a list of protocols and
a Web server name or IP address, plus port.
In Internet Explorer choose View/Internet Options/Connection and click
on the Advanced button. You'll see a similar list of protocols, Internet
address and port.
Feed that same information to your user agent with the following one line
of code, being sure to specify each protocol with an entry on your Web
browser. Assume your proxy is used by the Gopher, HTTP, Security and FTP,
that it's located at 127.0.0.1 and port 8080.
$ua->proxy(['gopher','http','security','ftp'],
'http://127.0.0.1:8080/');
Make this entry at any point in the code before you feed the user agent
the request.
That's really about it, gang. Sure, there are a few cleanups: InterNIC
embeds <a href> tags wherever they make sense. Stripping them out
is a matter of a simple Perl regex s/<[^>]*>|//g.
If you want to know how and why this works, see the section on
regular expression parsing I expect to get written one day.
Final code