This is the output to my program from this link (http://www.rottentomatoes.com/movie/box_office.php). As you can see I am missing some of the movies on the page, for instance number 18 (one for the money) isn’t there. My question is can anyone check my regex and help me figure out why it isn’t grabbing all of the movies or if there is something wrong in my code that I can’t find?
I need to add that I am using the lynx command to grab the data. Yes I have to use it =(. I updated the code to show how I am getting the info from the webpage.
Also I only want to print 35 characters of the movie name, so if its over that I just want to truncate everything after.
OUTPUT:
## ## Movie Title Weekend Cume T-Meter
1 2 Safe House $78.2M $7.7k 52%
2 1 The Vow $85.5M $8.0k 30%
3 -- Ghost Rider: Spirit of Vengeance $22.0M $6.9k 15%
4 3 Journey 2: The Mysterious Island $53.2M $5.7k 43%
5 -- This Means War $19.2M $5.5k 25%
6 4 Star Wars: Episode I - The Phantom Menace (in 3D) $33.7M $3.0k 57%
7 5 Chronicle $51.0M $2.9k 84%
8 6 The Woman in Black $45.3M $2.6k 63%
9 -- The Secret World of Arrietty $6.4M $4.2k 93%
10 7 The Grey $47.9M $1.4k 78%
11 9 The Descendants $75.0M $2.4k 89%
12 13 The Artist $27.4M $2.9k 97%
13 8 Big Miracle $16.6M $1.3k 73%
14 14 Hugo $66.7M $2.9k 93%
15 11 Red Tails $47.5M $1.4k 36%
16 10 Underworld Awakening $61.3M $1.3k 28%
17 18 The Iron Lady $24.4M $1.7k 53%
19 15 Extremely Loud & Incredibly Close $30.6M $1.1k 45%
20 17 Contraband $65.7M $1.2k 49%
21 23 Alvin and the Chipmunks: Chipwrecked $129.7M $1.2k 13%
22 20 Mission: Impossible Ghost Protocol $207.3M $1.8k 93%
23 22 Tinker Tailor Soldier Spy $22.7M $2.6k 84%
24 29 The Adventures of Tintin $76.4M $1.3k 75%
25 33 A Separation $2.1M $6.2k 99%
27 31 Albert Nobbs $2.4M $1.6k 53%
28 -- Thin Ice $0.2M $3.6k 72%
29 36 My Week with Marilyn $13.6M $1.5k 84%
30 37 A Dangerous Method $5.2M $1.7k 77%
31 35 Puss in Boots $149.0M $1.0k 83%
33 53 In Darkness $0.1M $5.5k 86%
34 44 We Need to Talk About Kevin $0.6M $4.0k 80%
36 48 W.E. $0.2M $2.5k 13%
37 47 Rampart $0.1M $1.8k 73%
38 52 Coriolanus $0.3M $2.9k 94%
39 -- Bullhead $33.6k $4.8k 86%
40 -- Undefeated $30.9k $6.2k 92%
42 55 Chico & Rita $56.2k $5.3k 93%
43 54 Pariah $0.7M $1.5k 96%
Biggest Debut: Ghost Rider: Spirit of Vengeance (3)
Weakest Debut: Undefeated (40)
Biggest Gain: In Darkness (20 places)
Biggest Loss: Underworld Awakening (6 places)
CODE:
my $pageToGrab = "http://www.rottentomatoes.com/movie/box_office.php";
my $command = "/usr/bin/lynx -dump -width=150 $pageToGrab";
my $tempPageFile = `$command`;
print "## "."## "."Movie Title "."Weekend "."Cume "."T-Meter \n";
do
{
if ($tempPageFile =~ /(\d+)\s+(\d+|\-\-)\s+(\d+\%)\s+\[\d+\](.*)\s+(\d+)\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\d+)/g)
{
$curweek[$i] = $1;
$lastweek[$i] = $2;
$tmeter[$i] = $3;
$title[$i] = $4;
$weekend[$i] = $7;
$cume[$i] = $8;
printf("%-4s%-4s%-38s%7s%10s%10s\n",$curweek[$i], $lastweek[$i], $title[$i], $weekend[$i], $cume[$i], $tmeter[$i]);
if ($lastweek[$i] ne '--')
{
$gain = $lastweek[$i] - $curweek[$i];
}
if( $gain > $largest)
{
$largest = $gain;
$biggestgaintitle = $title[$i];
}
if( $gain < $smallest)
{
$smallest = $gain;
$biggestlosstitle = $title[$i];
}
if( $lastweek[$i] eq '--')
{
$moviedebut[$j] = $curweek[$i];
$lastmovie = $title[$i];
$j++;
}
$i++;
}
}
while($i < 38);
Here is 18:
Notice that the 3rd dollar amount ($830) does not have an M or k suffix. Use
[Mk]?, perhaps for all 3 dollar amounts:To truncate:
perldoc -f substr