Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7579243
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T17:37:44+00:00 2026-05-30T17:37:44+00:00

Using Java, I want to strip the fragment identifier and do some simple normalisation

  • 0

Using Java, I want to strip the fragment identifier and do some simple normalisation (e.g., lowercase schemes, hosts) of a diverse set of URIs. The input and output URIs should be equivalent in a general HTTP sense.

Typically, this should be straightforward. However, for URIs like http://blah.org/A_%28Secret%29.xml#blah, which percent encodes (Secret), the behaviour of java.util.URI makes life difficult.

The normalisation method should return http://blah.org/A_%28Secret%29.xml since the URIs http://blah.org/A_%28Secret%29.xml and http://blah.org/A_(Secret).xml are not equivalent in interpretation [§2.2; RFC3968]

So we have the two following normalisation methods:

URI u = new URI("http://blah.org/A_%28Secret%29.xml#blah");
System.out.println(u);
        // prints "http://blah.org/A_%28Secret%29.xml#blah"

String path1 = u.getPath();      //gives "A_(Secret).xml"
String path2 = u.getRawPath();   //gives "A_%28Secret%29.xml"


//NORMALISE METHOD 1
URI norm1 = new URI(u.getScheme().toLowerCase(), u.getUserInfo(), 
                      u.getHost().toLowerCase(), u.getPort(), path1, 
                      u.getQuery(), null);
System.out.println(norm1);
// prints "http://blah.org/A_(Secret).xml"

//NORMALISE METHOD 2
URI norm2 = new URI(u.getScheme().toLowerCase(), u.getUserInfo(),
                      u.getHost().toLowerCase(), u.getPort(), path2, 
                      u.getQuery(), null);
System.out.println(norm2);
// prints "http://blah.org/A_%2528Secret%2529.xml"

As we see, the URI is parsed and rebuilt without the fragment identifier.

However, for method 1, u.getPath() returns an unencoded URI, which changes the final URI.

For method 2, u.getRawPath() returns the original path, but when passed to the URI constructor, Java decides to add double-encoding.

This feels like a Chinese finger trap.

So two main questions:

  • Why does java.util.URI feel the need to play with encoding?
  • How can this normalise method be implemented without fiddling with the original percent encoding?

(I would rather not have to implement the parse/concatenate methods of java.util.URI, which are non-trivial.)


EDIT: Here’s some further info from URI javadoc.

  • The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.

  • The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character (‘%’) is always quoted by these constructors. Any other characters are preserved.

  • The getRawUserInfo, getRawPath, getRawQuery, getRawFragment, getRawAuthority, and getRawSchemeSpecificPart methods return the values of their corresponding components in raw form, without interpreting any escaped octets. The strings returned by these methods may contain both escaped octets and other characters, and will not contain any illegal characters.

  • The getUserInfo, getPath, getQuery, getFragment, getAuthority, and getSchemeSpecificPart methods decode any escaped octets in their corresponding components. The strings returned by these methods may contain both other characters and illegal characters, and will not contain any escaped octets.

  • The toString method returns a URI string with all necessary quotation but which may contain other characters.

  • The toASCIIString method returns a fully quoted and encoded URI string that does not contain any other characters.

So I cannot use the multi-argument constructor without having the URL encoding messed with internally by the URI class. Pah!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T17:37:45+00:00Added an answer on May 30, 2026 at 5:37 pm

    Because java.net.URI is introduced in java 1.4 (which comes out at 2002) and it’s based on RFC2396 which treats ‘(‘ and ‘)’ as characters which don’t need escape and the semantic doesn’t change even if it is escaped, furthermore it even says one should not escape it unless it’s necessary (§2.3, RFC2396).

    But RFC3986 (which comes out at 2005) changed this, and I guess developers of JDK decide not to change the behavior of java.net.URI for compatibility of existing code.

    By random googling, I found Jena IRI looks good.

    public class IRITest {
    public static void main(String[] args) {
        IRIFactory factory = IRIFactory.uriImplementation();
        IRI iri = factory.construct("http://blah.org/A_%28Secret%29.xml#blah");
        ArrayList<String> a = new ArrayList<String>();
        a.add(iri.getScheme());
        a.add(iri.getRawUserinfo());
        a.add(iri.getRawHost());
        a.add(iri.getRawPath());
        a.add(iri.getRawQuery());
        a.add(iri.getRawFragment());
        IRI iri2 = factory.construct("http://blah.org/A_(Secret).xml#blah");
        ArrayList<String> b = new ArrayList<String>();
        b.add(iri2.getScheme());
        b.add(iri2.getRawUserinfo());
        b.add(iri2.getRawHost());
        b.add(iri2.getRawPath());
        b.add(iri2.getRawQuery());
        b.add(iri2.getRawFragment());
    
        System.out.println(a);
        //[http, null, blah.org, /A_%28Secret%29.xml, null, blah]
        System.out.println(b);
        //[http, null, blah.org, /A_(Secret).xml, null, blah]
    }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using java to send mail. I want to set the from mail
I'm using Java. I want to use the Deflater class to deflate some input,
I want to create and publish simple WebService using Java. Everything compiles. When I
I basically want to set up a proxy server using Java which will capture
I want to send post request to some service using Java. Requests must contain
Using Java (1.6) I want to split an input string that has components of
I want city name from IP address using Java is there any idea to
I'm using Java. I want to have a setter method of one class that
I want to take a snapshot with my webcam using java and save it
I want to publish message on Facebook's fan page using JAVA/GWT API. Any one

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.