I’m currently working on a Perl script to gather data from the QuakeLive website.

Question

0

Asked: May 17, 20262026-05-17T02:49:21+00:00 2026-05-17T02:49:21+00:00

I’m currently working on a Perl script to gather data from the QuakeLive website.

0

I’m currently working on a Perl script to gather data from the QuakeLive website.
Everything was going fine until I couldn’t get a set of data.

I was using regexes for that and they work for everything apart from the favourite arena, weapon and game type. I just need to get the names of those three elements in a $1 for further processing.

I tried regexing up to the favorites image, but without succeeding. If it’s any use, I’m already using WWW::Mechanize in the script.

I think that the problem could be related to the class name of the paragraphs where those elements are, while the previous one was classless.

You can find an example profile HERE.

Note that for the previous part of the page, it worked using code like:

$content =~ /<b>Wins:<\/b> (.*?)<br \/>/;
$wins = $1;
print "Wins: $wins\n";

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T02:49:21+00:00

The immediate problem is that you have:

<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif" 
     width="17" height="17" alt="" class="fl fivepxhr" />
                <b>Arena:</b> Campgrounds
                <div class="cl"></div>
            </p>

That is, there is no <br /> following the value for favorites such as Arena. Now, the correct way to do this would involve using a proper HTML parser. The fragile solution is to adapt your pattern (untested):

my ($favarena) = $content =~ m{<b>Arena:</b> ([^<]+)};

That should put everything up to the < of the next <div> in $favarena. Now, if all arenas are single words with no spaces in them,

my ($favarena) = $content =~ m{<b>Arena:</b> (\S+)};

would save you the trouble of having to trim whitespace afterwards.

Note that it is easy for such regex based solutions to be fooled with simple things like commented out snippets in the source. E.g., if the source were to be changed to:

<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif" 
     width="17" height="17" alt="" class="fl fivepxhr" />
<!-- <b>Arena: </b> here -->
                <b>Arena:</b> Campgrounds
                <div class="cl"></div>
            </p>

your script would be in trouble where as a solution using an HTML parser would not.

An example using HTML::TokeParser::Simple:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $p = HTML::TokeParser::Simple->new( 'martianbuddy.html' );

while ( my $tag = $p->get_tag('p') ) {
    next unless $tag->is_start_tag;
    next unless defined (my $class = $tag->get_attr('class'));
    next unless grep { /^prf_faves\z/ } split ' ', $class;

    my $fav = $p->get_tag('b');
    my $type = $p->get_text('/b');
    my $value = $p->get_text('/p');
    $value =~ s/\s+\z//;

    print "$type = $value\n";
}

Output:

Arena:  Campgrounds
Game Type:  Clan Arena
Weapon:  Rocket Launcher

And, here is an example using HTML::TreeBuilder:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TreeBuilder;
use YAML;

my $tree = HTML::TreeBuilder->new;
$tree->parse_file('martianbuddy.html');

my @p = $tree->look_down(_tag => 'p', sub {
        return unless defined (my $class = $_[0]->attr('class'));
        return unless grep { /^prf_faves\z/ } split ' ', $class;
        return 1;
    }
);

for my $p ( @p ) {
    my $text = $p->as_text;
    $text =~ s/^\s+//;
    my ($type, $value) = split ': ', $text;
    print "$type: $value\n";
}

Output:

Arena: Campgrounds 
Game Type: Clan Arena 
Weapon: Rocket Launcher

Given that the document is an HTML fragment rather than a full document, you will have more success with modules based on HTML::Parser rather than those that expect to operate on well-formed XML documents.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently working on a Perl script to gather data from the QuakeLive website.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply