Where could I find a code (javascript would be the best) to strip out the www and second-level domain names from URLs?
Example:
www.ynet.co.il -> ynet (stripped 'co.il' - two tokens) www.nike.com -> nike (stripped 'com' - one token)
etc
As a second best – the full list of second-level domains (preferably in CSV or any other format) will be welcomed as well.
If you use Java, Guava can help you here.
You can use
InternetDomainName.topPrivateDomain()together withpublicSuffix()to solve your problem.Guava (as well as Mozilla/Firefox, Chrome and Opera) use the Public Suffix List for this functionality (the raw data is here).
tld.js is a JavaScript library that uses that data as well.