Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 46363
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T15:57:15+00:00 2026-05-10T15:57:15+00:00

RFC 1738 specifies the syntax for URL’s, and mentions that URLs are written only

  • 0

RFC 1738 specifies the syntax for URL’s, and mentions that

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.

It does not, however, say what code set these octets then represent.

RFC 2396 seems to try and improve on the situation, but:

For original character sequences that contain non-ASCII characters, however, the situation is more difficult. Internet protocols that transmit octet sequences intended to represent character sequences are expected to provide some way of identifying the charset used, if there might be more than one [RFC2277]. However, there is currently no provision within the generic URI syntax to accomplish this identification. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification.

Is there any unambigous way in which a client can determine in which character set to interpret encoded octets, or in which a server can determine what a client used to encode with ?

It looks to me like most servers default to UTF-8, but this seems to be a de facto choice more than a specified one.

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T15:57:16+00:00Added an answer on May 10, 2026 at 3:57 pm

    As per your quote, URLs are ASCII. That’s all.

    URIs OTOH, allow for bigger charsets; usually UTF-8 as you said yourself.

    The point to remember is that URLs are a subset of URIs. Therefore, the real question is, which of these is what you write in a browser?

    I’d guess you can write an URI, and the browser should try its best to transform to an URL (which is what HTTP/1.1 support, AFAICR). For non-ASCII characters, that means hexcodes, usually coding UTF-8.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

From reading the RFC it appears that CID can/must only contain characters from the
According to RFC 1738: Uniform Resource Locators (URL): 3.1. Common Internet Scheme Syntax ,
I'm trying to remove non-RFC characters after filtering a URL with other methods. This
The HTTP/1.1 RFC stipulates The HEAD method is identical to GET except that the
The RFC seems to suggest that the client should permanently cache the response: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
According to RFC 2396 , The plus +, dollar $, and comma , characters
Reading the FTP RFC (RFC959), I notice some modes that I've never seen used,
As seen here https://www.rfc-editor.org/rfc/rfc1929 , I would assume that it is either 255, or
The RFC for a Java class is set of all methods that can be
When printing RFC documents on my A4 format printer I noticed that I print

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.