I’ve been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and ‘exotic’ Unicode characters and produce output that’s identical to JavaScript’s encodeURIComponent function.
My torture test string is: ‘A’ B ± ‘
If I enter the following JavaScript statement in Firebug:
encodeURIComponent(''A' B ± '');
—Then I get:
'%22A%22%20B%20%C2%B1%20%22'
Here’s my little test Java program:
import java.io.UnsupportedEncodingException; import java.net.URLEncoder; public class EncodingTest { public static void main(String[] args) throws UnsupportedEncodingException { String s = '\'A\' B ± \''; System.out.println('URLEncoder.encode returns ' + URLEncoder.encode(s, 'UTF-8')); System.out.println('getBytes returns ' + new String(s.getBytes('UTF-8'), 'ISO-8859-1')); } }
—This program outputs:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns 'A' B ± '
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript’s encodeURIComponent?
EDIT: I’m using Java 1.4 moving to Java 5 shortly.
Looking at the implementation differences, I see that:
MDC on
encodeURIComponent():[-a-zA-Z0-9._*~'()!]Java 1.5.0 documentation on
URLEncoder:[-a-zA-Z0-9._*]' 'is converted into a plus sign'+'.So basically, to get the desired result, use
URLEncoder.encode(s, 'UTF-8')and then do some post-processing:'+'with'%20''%xx'representing any of[~'()!]back to their literal counter-parts