I am working on a program and need to extract TLD and web page extension from the URL
E.g: http://www.example.com/somedir/someotherdir/index.html should give me TLD .com and Extension Html
While this: http://www.example.com.au/somedir/someotherdir/index/ should give me TLD .com.au and Extension null
Is there any way I can do this with Regex in Perl? I am using the URI module in Perl but It cannot seem to do this Type of extraction.
If you’re using the URI module, you can easily extract the
hostandpath. Then it’s a simple matter of taking everything after the last dot, or conversely removing everything up to and including the last dot. You may want to get more complicated for the extension, to properly handle cases where there is no extension.