I was wondering how programmatically to get the website name and page name of a webpage, or at least how to get a best guess.
For example, the website name of this question’s webpage is Stack Overflow, and the page title is “How to get the website name and page title of a webpage”.
I know it’s not possible to get 100% accuracy (or even close), but it’d be great to at least be able to make an attempt at this. Programming language is irrelevant.
If you’re scraping another site with something like PHP Simple DOM Parser
That’s everything in between the title. If you’re going to attempt to extract a pseudo website name (assuming it’s in the title) you can go about sorting out the first series of letters before it gets to a separator, usually ” – “, ” :: “, ” — ” or some other variant. You would probably want to go look at 100 websites and find the most common.