I have to retrieve this url from a dirty html page: ……… http://www.imdb.com/title/tt0092699/ ……

Question

Editorial Team

Asked: May 28, 20262026-05-28T19:29:45+00:00 2026-05-28T19:29:45+00:00

I have to retrieve this url from a dirty html page:

Obviously url can also be

(.domain, http/https or without final slash)

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-05-28T19:29:47+00:00

Use this regex:

preg_match("/https?:\/\/www.imdb\..*?\/title\/tt\d+\/?/", $html, $matches);

The url you want will be in $matches[0].

Here’s the regex meaning, broken down piece by piece:

/ => start regex
https? => literal http followed by optional s
:\/\/www.imdb\. => literal ://www.imdb.
.*?\/ => matches the shortest string possible before a slash, then a slash; will match the domain end, whatever it is (com, co.uk, es, etc…) and the first slash following it
title\/ => literal title/
tt\d+ => literal tt followed by at least one digit (and it’s a greedy match, so it will match the most number of consecutive digits it can); will match ids in the format you provided
\/? => optional final /
/ => end regex

The Archive Base Latest Questions