Let’s say a browser encounters a link like this:
<a href='stackoverflowhome.html'>home</a>
This is clearly a relative url to an html file in the current directory, but how does the browser know that the .html is a file extension, and not a TLD (top level domain)? Does it have a list of common file extensions, or a list of TLDs? And if so, is it manually updated whenever a new file format becomes commonly used, or when the list of accepted TLDs change, for example with brand tlds?
It’s because that is how RFC 3986 specified that URIs should be parsed. If the URI does not have a
scheme(a set of characters followed by a colon – e. g.http:orgopher:) then it must be treated as a relative URI. Quoting from the RFC:User-agents are allowed to make their best guess about what the user meant (see section 4.5) especially in cases where the context is ambiguous (such as URL bars in browsers) but the RFC recommends against it where the URI will be around for a long time as the best guess of user-agents will change over time, thus leading to URIs that don’t resolve to the same resource depending on the time they are accessed or the user-agent they are accessed with.