--#-- Strip tease --#-- When an HTML file is returned by a Web server, whether to your browser or to your robot user agent, a good portion of the text is HTML mark-up code. The browser finds this essential to rendering the page, but a user agent finds it downright annoying.

If you want just to strip all the HTML tags out of the file, you might be interested in the Perl module HTML::Parser.

We want something less generic: We want just the list of General Motor's domains. So, let's just strip those out.

To start, capture the WHOIS output to a file by redirecting your Perl script's output to a file. If you open the file in a browser, you'll see the list your looking for. GM has so many domains registered, NSI's WHOIS server just quits when it gets to 50.

Open the same file in a text editor and find those same domain listings. Notice that the entire listing is encased in a <pre> </pre> tag pair? We can quickly cut down on the amount of text we have to parse by simply cutting away what's outside those tags.

First, grab the script's output into another Perl variable, rather than printing it to stdout. Then use a Perl regex expression to strip away the unwanted fluff. (Note the use of the 's' operator on the end of in the Perl substitution function. Because $list contains multiple lines, each ending in a newline character, you need to tell the substitution to treat each newline as "space.")

$list = $repsonse->content; $list =~ s#.*
(.*)
.*#$1#sg;

Now, if you add print "$list\n"; to your code you'll see the list of 50 domains. But it's still choked with HTML.

Another Perl regex can easily strip out any string comprised of characters, a dot and three characters: s/\b([w-\.]*\w{3,3})/; Use of the parentheses lets us grab what was matched inside the parentheses.

Since we start with a multi-lined string, let's break it into individual lines and process each line:

@lines = split(/\n/, $list); foreach (@lines) { if ( /\s*([\w-]*\.\w{3,3})/) { print "$1\n"; } }
So now we have 26 lines of Perl code that automagically queries NSI's WHOIS server for the domain names that GM holds, or at least the first 50 of them.

But I promised more than that, didn't I? I promised you'd get the InterNIC records on all the domains. And I'll deliver just that.