I’m reading RFC2396 on URLs which says
Many URI include components consisting of or delimited by, certain
special characters. These characters are called “reserved”, since
their usage within the URI component is limited to their reserved
purpose.
But the section on the query part of url (between ? and #) says
3.4. Query Component
The query component is a string of information to be interpreted by
the resource.query = *uricWithin a query component, the characters “;”, “/”, “?”, “:”, “@”,
“&”, “=”, “+”, “,”, and “$” are reserved.
What is the “reserved purpose of each of those characters? I understand what &, =, and + are used for in the query, but what about the other characters?
More practically, should I always url encode those characters when they’re in the query? Browsers and servers that I’ve seen handle : and ; and other characters without being encoded
I think that Section 2.2 of RFC 3986, which obsoletes RFC 2396, has a
possible explanation. I quote:
I think that what Berners-Lee, et al. are trying to get at here is that even if
not all reserved characters are used in the generic syntax described in the
RFC, the authors wanted to leave enough latitude for future schemes or
implementation specific code to be able to use those characters as they saw
fit.
As to whether you should encode those characters, my opinion is that you should
research and use a Percent-Encoding Algorithm that follows the standard
and not use a non-standard one or try to roll-your-own. For instance, if you
are using a language like C# or Python then the libraries that come with those
languages include a standards-compliant implementation of the algorithm. For more
details, the section 2.4 of RFC 3986 covers when to encode or decode.