UPDATED WITH SOLUTION, see at bottom Requirement : Process a ZIP file in Java

Question

0

Asked: June 17, 20262026-06-17T08:02:51+00:00 2026-06-17T08:02:51+00:00

UPDATED WITH SOLUTION, see at bottom Requirement : Process a ZIP file in Java

0

UPDATED WITH SOLUTION, see at bottom

Requirement:
Process a ZIP file in Java SE 6 that contains files with special characters in the file names. As the encoding (of the ZIP producer) is not UTF-8, special characters get encoded. Therefore I would like to correct special characters into their proper code.

Issue:
The ZIP contains a file called abcüabc.txt .
The entry gets processed via java.util.zip.ZipEntry and when printing out single characters I see these characters (bytes):

ü gets encoded as
u followed by a
¨

Question:
So I would like to know how I can replace that u¨ into ü or maybe ue:

What I already tried and did not work out:
name.replaceAll("u\\¨", "ue");
or
name.replaceAll("ü", "ue");

Original Source Code (not working):

InputStream is = new FileInputStream(new File("/Users/me/Desktop/test.zip"));
ZipInputStream zipStream = new ZipInputStream(is);
ZipEntry zipEntry = null;
while ((zipEntry = zipStream.getNextEntry()) != null) {
    String name = zipEntry.getName(); // reading abcüabc.txt
    System.out.println("pos 3: "+name.charAt(3));
    System.out.println("pos 4: "+name.charAt(4));
    System.out.println("is equal to ¨: "+Character.toString(name.charAt(4)).equals("¨"));
}

Output:

pos 3: u
pos 4:¨
is equal to ¨: false

Notes on my environment:

Zip produced under Mac OS X 10.6.8
Java SE 6: Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

SOLUTION

Obviously, the ZIP producer (in my case Mac OSX) converts special characters into a decomposed format. So a ü gets decomposed into u¨.
While extracting the file names form the ZIP, we would like to convert back from the decomposed to the composed format, so we only have to insert a normalization into our source code from above:

InputStream is = new FileInputStream(new File("/Users/me/Desktop/test.zip"));
ZipInputStream zipStream = new ZipInputStream(is);
ZipEntry zipEntry = null;
while ((zipEntry = zipStream.getNextEntry()) != null) {
    String name = zipEntry.getName(); // reading abcüabc.txt
    System.out.println("pos 3: "+name.charAt(3));
    System.out.println("pos 4: "+name.charAt(4));
    System.out.println("contains ü: "+name.contains("ü"));
    name = Normalizer.normalize(name, Form.NFC);
    System.out.println("contains ü: "+name.contains("ü"));
}

Output:

pos 3: u
pos 4:¨
contains ü: false
contains ü: true

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T08:02:52+00:00

That’s not a ¨ (U+00A8 DIAERESIS), but the U+0308 COMBINING DIAERESIS.

The character is splitted this way because Mac Os stores file names in the Normalization Form D, which Decomposes characters like this.

You can compose it back like so:

String name = zipEntry.getName(); 
name = Normalizer.normalize(name, Form.NFC);

More about normalization forms

The difference between the diaeresises is how they modify or don’t modify the previous base character:

    System.out.println( "u" + (char)0xA8); //u¨
    System.out.println( "u" + (char)0x0308); //ü

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

UPDATED WITH SOLUTION, see at bottom Requirement : Process a ZIP file in Java

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply