Given the URL (single line):
http://test.example.com/dir/subdir/file.html
How can I extract the following parts using regular expressions:
- The Subdomain (test)
- The Domain (example.com)
- The path without the file (/dir/subdir/)
- The file (file.html)
- The path with the file (/dir/subdir/file.html)
- The URL without the path (http://test.example.com)
- (add any other that you think would be useful)
The regex should work correctly even if I enter the following URL:
http://test.example.com/example/example/example.html
you could then further parse the host (‘.’ delimited) quite easily.
What I would do is use something like this:
the further parse ‘the rest’ to be as specific as possible. Doing it in one regex is, well, a bit crazy.