Given a URL, how do I extract the registered domain using the Public Suffix List (list of effective TLDs, e.g. this list)?
For instance, considering a.bg is a valid public suffix:
http://www.test.start.a.bg/hello.html -> start.a.bg
http://test.start.a.bg/ -> start.a.bg
http://test.start.abc.bg/ -> abc.bg (.bg is the public suffix)
This cannot be done using simple string manipulation because the public suffix can consist of multiple levels depending on the TLD.
P.S. It doesn’t matter how I read the list (database or flat file), but the list should be accessible locally so I’m not always dependent on external services.
You can use
parse_url()to extract the hostname, then use the library provided by regdom to determine the registered domain name (dn + eTLD). For example:That will print out
metu.edu.tr.Other examples I’ve tried:
UPDATE: These libraries have been moved to: https://github.com/leth/registered-domains-php