How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f) using Java? EDIT: I guess

Question

0

Editorial Team

Asked: May 24, 20262026-05-24T11:15:47+00:00 2026-05-24T11:15:47+00:00

How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f) using Java? EDIT: I guess

0

How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f) using Java?

EDIT:
I guess the question is not very clear… Basically what I want is this:

Given string s=”blalbla” I want to get string “\uXXX\uYYYY”

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T11:15:48+00:00

You will need to extract each code point/unit from the String and encode it yourself. The following works for all Strings even if the individual linguistic characters within the String are composed of digraphs or ligatures.

public String getUnicodeEscapes(String aString)
{
    if (aString != null && aString.length() > 0)
    {
        int length = aString.length();
        StringBuilder buffer = new StringBuilder(length);
        for (int ctr = 0; ctr < length; ctr++)
        {
            char codeUnit = aString.charAt(ctr);
            String hexString = Integer.toHexString(codeUnit);
            String padAmount = "0000".substring(hexString.length());
            buffer.append("\\u");
            buffer.append(padAmount);
            buffer.append(hexString);
        }
        return buffer.toString();
    }
    else
    {
        return null;
    }
}

The above produces output as dictated by the Java Language Specification on Unicode escapes, i.e. it produces output of the form \uxxxx for each UTF-16 code unit. It addresses supplementary characters by producing a pair of code units represented as \uxxxx\uyyyy.

The originally posted code has been modified to produce Unicode codepoints in the format U+FFFFF:

public String getUnicodeCodepoints(String aString)
{
    if (aString != null && aString.length() > 0)
    {
        int length = aString.length();
        StringBuilder buffer = new StringBuilder(length);
        for (int ctr = 0; ctr < length; ctr++)
        {
            char ch = aString.charAt(ctr);
            if (Character.isLowSurrogate(ch))
            {
                continue;
            }
            else
            {
                int codePoint = aString.codePointAt(ctr);
                String hexString = Integer.toHexString(codePoint);
                String zeroPad = Character.isHighSurrogate(ch) ? "00000" : "0000";
                String padAmount = zeroPad.substring(hexString.length());
                buffer.append(" U+");
                buffer.append(padAmount);
                buffer.append(hexString);
            }
        }
        return buffer.toString();
    }
    else
    {
        return null;
    }
}

The gruntwork is done by the String.codePointAt() method which returns the Unicode codepoint at a particular index. For a String instance composed of combinational characters, the length of the String instance will not be the length of the number of visible characters, but the number of actual Unicode codepoints. For example, क and ् combine to form क् in Devanagari, and the above function will rightfully return U+0915 U+094d without any fuss as String.length() will return 2 for the combined character. Strings with supplementary characters will be with single codepoints for the individual characters – (the page will not display this String literal correctly, but you can copy this just fine; it should be Javascript but written using the supplementary character set for Mathematical alphanumeric symbols) will return U+1d4a5 U+1d4b6 U+1d4cb U+1d4b6 U+1d4c8 U+1d4b8 U+1d4c7 U+1d4be U+1d4c5 U+1d4c9.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f) using Java? EDIT: I guess

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply