I am trying to write a program using the lynx command on this page

Question

0

Asked: May 29, 20262026-05-29T20:50:51+00:00 2026-05-29T20:50:51+00:00

I am trying to write a program using the lynx command on this page

0

I am trying to write a program using the lynx command on this page “http://www.rottentomatoes.com/movie/box_office.php” and I can’t seem to wrap my head around a certain problem…. getting the title by itself. My problem is a title can contain special characters, numbers, and all titles are variable in length. I want to write a regex that could parse the entire page and find lines like this….
(I added spaces between the title and the next number, which is how many weeks it has been out, to distinguish between title and weeks released)

1 -- 30%  The Vow                                           1 $41.2M $41.2M $13.9k 2958
2 -- 53%  Safe House                                        1 $40.2M $40.2M $12.9k 3119
3 -- 42%  Journey 2: The Mysterious Island                  1 $27.3M $27.3M $7.9k 3470
4 -- 57%  Star Wars: Episode I - The Phantom Menace (in 3D) 1 $22.5M $22.5M $8.5k 2655
5 1  86%  Chronicle                                         2 $12.1M $40.0M $4.2k 2908

the regex I have started out with is:

/(\d+)\s(\d+|\-\-)\s(\d+\%)\s

If someone can help me figure out how to grab the title successfully that would be much appreciated! Thanks in advanced.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T20:50:52+00:00

Capture all the things!!

^(\d+)\s+(\d+|\-\-)\s+(\d+\%)\s+(.*)\s+(\d+)\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\d+)$

Explained:

^                            <- Start of the line
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\d+|\-\-)\s+            <- Numbers [or "--"] (captured) followed by as many spaces as you want
    (\d+\%)\s+               <- Numbers [with '%'] (captured) followed by as many spaces as you want
    (.*)\s+                  <- Anything you can match [don't be greedy] (captured) followed by as many spaces as you want
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\d+)                    <- Numbers (captured)
$                            <- End of the line

So to be serious this is what I’ve done, I cheated a bit and captured everything (as I think you’ll do in the end) to get a lookahead for the title capture.

In a non-greedy regex (.*) [or (.*?) if you want to force the “ungreedyness”] will capture the least possible characters, and the end of the regex tries to capture everything else.

Your regex ends up capturing only the title (the only thing left).

What you can do is using an actual lookahead and make assertions.

Resources:

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to write a program using the lynx command on this page

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply