Sunday, May 07, 2006

Perl code: grab html and image from web

我用以下代码抓了HTML页面
#!/usr/bin/perl
use HTTP::Client;
my $client = HTTP::Client->new();
my $site = $client->get("http://www.csc.liv.ac.uk/");
my @headers = $client->response_headers;
my $agent = $client->agent;
print $site;

Grap images:
#!/usr/bin/perl
# cnhacktnt {a t} perlchina.org
# http://perlchina.org or http://wanghui.org

use LWP::Simple;

$url='http://www.csc.liv.ac.uk/';
$content=get $url;

if ($content) {
while ($content=~ m/src="(.+?)"/gi) {
$imgurl=$1;

if ($imgurl=~ m/^(?:http|HTTP).*\/(.*)$/) {

$filename=$1;
$imgs{$filename}=$imgurl;

}else{

$imgurl=~ m/"(.+?)"$/;
$filename=$1;
$imgs{$filename}=$url.$imgurl;

}
}
for (keys %imgs) {
print "Getting $imgs{$_},save as $_\n";
getstore $imgs{$_},$_;
}
}

Another approach for grabing images.

我在CPAN 上也找到了另外一种方法.
我也贴上来把.
#!/usr/bin/perl
use Image::Grab;
$pic = new Image::Grab;
$pic->url('http://album.9you.com/pic/comicphoto/98/uqh1116383139.jpg');
$pic->grab;
open(IMAGE, ">image.jpg") || die"image.jpg: $!";
binmode IMAGE; # for MSDOS derivations.
print IMAGE $pic->image;
close IMAGE;

No comments: