So I want to match just the domain from ether:
http://www.google.com/test/
http://google.com/test/
http://google.net/test/
Output should be for all 3: google
I got this code working for just .com
echo "http://www.google.com/test/" | sed -n "s/.*www\.\(.*\)\.com.*$/\1/p"
Output: 'google'
Then I thought it would be as simple as doing say (com|net) but that doesn’t seem to be true:
echo "http://www.google.com/test/" | sed -n "s/.*www\.\(.*\)\.(com|net).*$/\1/p"
Output: '' (nothing)
I was going to use a similar method to get rid of the “www” but it seems im doing something wrong… (does it not work with regex outside the \( \) …)
This will output “google” in all cases:
Edit:
This version will handle URLs like “‘http://google.com.cn/test” and “http://www.google.co.uk/” as well as the ones in the original question:
This version will handle cases that don’t include “http://” (plus the others):