How can I get all IMDB ids from a page? For example, I want

Question

0

Asked: June 6, 20262026-06-06T15:17:17+00:00 2026-06-06T15:17:17+00:00

How can I get all IMDB ids from a page? For example, I want

0

How can I get all IMDB ids from a page? For example, I want get all ids from here. In that page, urls are of the format:

http://www.imdb.com/title/tt0948470/

I need to get all ids from page using preg_match_all() – can any help me?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T15:17:19+00:00

Okay, I’m giving cooked-up code, but I also explain it:

Obtain the HTML source
Parse all <a> href attributes
Test with a regular expression if their value matches.
If it matches, extract the id from the link and store it in a way that you don’t get any duplicates.
Done.

Example/Demo

// initialize
$ids   = array(); 
$url   = 'http://www.imdb.com/movies-coming-soon/'; # this URL
$expr  = '//a/@href';                               # these attributes
$regex = '(/title/(tt\d{5,7})/)u';                  # matching this regex
$match = 1;                                         # take group 1

// process
foreach((new DOMXpath(@DOMDocument::loadHTMLFile($url)))->query($expr) as $obj)
    preg_match($regex, $obj->value, $matches)
      && $ids[$matches[$match]] = 0;
    ;
$ids = array_keys($ids);

// output
print_r($ids);

(Notes: You tagged this question PHP5, current stable PHP5 is 5.4, so is this example; If you configure your PHP5 version with the curl wrappers, this code is curl.)

Edit: Lower PHP Versions:

...
// process
$xp = new DOMXpath(@DOMDocument::loadHTMLFile($url));
foreach($xp->query($expr) as $obj)
...

Edit2: Just seeing that IMDB tags it’s markup, so it’s possible to retrieve the actual movie entries of that list rather than any title links on that page.

This require a little improvement in the xpath expression used. Because the parsing is now much more intelligent, duplicates do not exist and so there is no need to remove them:

// initialize
$ids   = array();
$url   = 'http://www.imdb.com/movies-coming-soon/'; # this URL
$expr  = '//*[@itemtype="http://schema.org/Movie"]
                //a[@itemprop="url"]/@href';        # these attributes
$regex = '(/title/(tt\d{5,7})/)u';                  # matching this regex
$match = 1;                                         # take group 1

// process
$xp = new DOMXpath(@DOMDocument::loadHTMLFile($url));
foreach($xp->query($expr) as $obj)
    preg_match($regex, $obj->value, $matches)
        && $ids[] = $matches[$match];
;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How can I get all IMDB ids from a page? For example, I want

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply