RFC 1738 specifies the syntax for URL’s, and mentions that URLs are written only

Question

0

Asked: May 10, 20262026-05-10T15:57:15+00:00 2026-05-10T15:57:15+00:00

RFC 1738 specifies the syntax for URL’s, and mentions that URLs are written only

0

RFC 1738 specifies the syntax for URL’s, and mentions that

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.

It does not, however, say what code set these octets then represent.

RFC 2396 seems to try and improve on the situation, but:

For original character sequences that contain non-ASCII characters, however, the situation is more difficult. Internet protocols that transmit octet sequences intended to represent character sequences are expected to provide some way of identifying the charset used, if there might be more than one [RFC2277]. However, there is currently no provision within the generic URI syntax to accomplish this identification. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification.

Is there any unambigous way in which a client can determine in which character set to interpret encoded octets, or in which a server can determine what a client used to encode with ?

It looks to me like most servers default to UTF-8, but this seems to be a de facto choice more than a specified one.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T15:57:16+00:00

As per your quote, URLs are ASCII. That’s all.

URIs OTOH, allow for bigger charsets; usually UTF-8 as you said yourself.

The point to remember is that URLs are a subset of URIs. Therefore, the real question is, which of these is what you write in a browser?

I’d guess you can write an URI, and the browser should try its best to transform to an URL (which is what HTTP/1.1 support, AFAICR). For non-ASCII characters, that means hexcodes, usually coding UTF-8.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

RFC 1738 specifies the syntax for URL’s, and mentions that URLs are written only

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply