Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8151165
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T15:17:17+00:00 2026-06-06T15:17:17+00:00

How can I get all IMDB ids from a page? For example, I want

  • 0

How can I get all IMDB ids from a page? For example, I want get all ids from here. In that page, urls are of the format:

http://www.imdb.com/title/tt0948470/

I need to get all ids from page using preg_match_all() – can any help me?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T15:17:19+00:00Added an answer on June 6, 2026 at 3:17 pm

    Okay, I’m giving cooked-up code, but I also explain it:

    1. Obtain the HTML source
    2. Parse all <a> href attributes
    3. Test with a regular expression if their value matches.
    4. If it matches, extract the id from the link and store it in a way that you don’t get any duplicates.
    5. Done.

    Example/Demo

    // initialize
    $ids   = array(); 
    $url   = 'http://www.imdb.com/movies-coming-soon/'; # this URL
    $expr  = '//a/@href';                               # these attributes
    $regex = '(/title/(tt\d{5,7})/)u';                  # matching this regex
    $match = 1;                                         # take group 1
    
    // process
    foreach((new DOMXpath(@DOMDocument::loadHTMLFile($url)))->query($expr) as $obj)
        preg_match($regex, $obj->value, $matches)
          && $ids[$matches[$match]] = 0;
        ;
    $ids = array_keys($ids);
    
    // output
    print_r($ids);
    

    (Notes: You tagged this question PHP5, current stable PHP5 is 5.4, so is this example; If you configure your PHP5 version with the curl wrappers, this code is curl.)

    Edit: Lower PHP Versions:

    ...
    // process
    $xp = new DOMXpath(@DOMDocument::loadHTMLFile($url));
    foreach($xp->query($expr) as $obj)
    ...
    

    Edit2: Just seeing that IMDB tags it’s markup, so it’s possible to retrieve the actual movie entries of that list rather than any title links on that page.

    This require a little improvement in the xpath expression used. Because the parsing is now much more intelligent, duplicates do not exist and so there is no need to remove them:

    // initialize
    $ids   = array();
    $url   = 'http://www.imdb.com/movies-coming-soon/'; # this URL
    $expr  = '//*[@itemtype="http://schema.org/Movie"]
                    //a[@itemprop="url"]/@href';        # these attributes
    $regex = '(/title/(tt\d{5,7})/)u';                  # matching this regex
    $match = 1;                                         # take group 1
    
    // process
    $xp = new DOMXpath(@DOMDocument::loadHTMLFile($url));
    foreach($xp->query($expr) as $obj)
        preg_match($regex, $obj->value, $matches)
            && $ids[] = $matches[$match];
    ;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

when i enter the page I can get all data without any problem My
So I know that you can get all instance variables in Ruby by calling
Can i get all the events and other informations from android device default calendar
How can i get all the users from the LDAP using PHP. function getUsers()
How can I get all products from customers1 and customers2 include their customer names?
i want to know how to get all column value from table using sphinx
I want to be able to create a tree view that can get its
How can I get all columns' names from an Oracle table using Java? Is
Using Spring, I can get all beans of a certain type that are currently
I am doing small modification to SLIME, so that I can get all currently

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.