This question concerns a Tomcat 7 web application, which is connected to a MySQL

Question

0

Asked: June 6, 20262026-06-06T20:33:25+00:00 2026-06-06T20:33:25+00:00

This question concerns a Tomcat 7 web application, which is connected to a MySQL

0

This question concerns a Tomcat 7 web application, which is connected to a MySQL (5.5.16) database.

When I open a zip file, That has filenames encoded in windows-1252 charset, the characters seem to be interpreted correctly by Java:

ZipFile zf = new ZipFile( zipFile, Charset.forName( "windows-1252" ) );
Enumeration entries = zf.entries();
while( entries.hasMoreElements() ) {
    ZipEntry ze = ( ZipEntry ) entries.nextElement();
    if( ! ze.isDirectory() ) {
        String name = ze.getName();
        System.out.println( name ); //prints correct filenames, e.g. café.pdf
    }
}

Omitting the Charset object in the ZipFile constructor would cause an exception.
The filenames in the zip file are printed correctly to standard output, including diacritics.
But, when I subsequently try to store the filename in a database, the e-acute is replaced with a question mark (as seen with the mysql console client).
I had no problems inserting special characters from the web application into MySQL before.

When I execute an INSERT with é in Java source code:

statement.executeUpdate( "insert into files (filename) values ('café.pdf')" );

the é shows up well in MySQL.

Also, my log file shows a comma instead of é: caf‚.pfd

Does anyone know what could be happening here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T20:33:27+00:00

The issue is resolved. This post suggested that the encoding of filenames in a zip file might not be windows-1252 but rather IBM437. Changing the Charset from:

ZipFile zf = new ZipFile( zipFile, Charset.forName( "windows-1252" ) );

to

ZipFile zf = new ZipFile( zipFile, Charset.forName( "IBM437" ) );

gave the desired result: when saving the acquired filename in MySQL, it was stored correctly with é.

What went wrong?

Printing out the filenames contained in the zip file to standard output with

System.out.println( name );

made me wrongly assume that the filenames in the zip file were interpreted well: when I used windows-1252 encoding to open the zip file, the filename was printed to standard output nicely with diacritic: café.pdf. Using other character encodings, different symbols appeared instead of the é.

But when printing the Unicode value of the é-char with the help of this answer, I was able to see that when opening the zip file with windows-1252 encoding, the actual Unicode value was NOT \u00e9 (latin small letter e with acute), but \u201a (single low-9 quotation mark). When I opened the ZipFile with IBM437 charset the correct Unicode value DID appear.

Of course when printing a String to standard output with PrintStream, the PrintStream is also associated with a certain character encoding. From the PrintStream Javadoc:

All characters printed by a PrintStream are converted into bytes using the platform’s default character encoding.

I am working on Windows XP.
When I created a new PrintStream

out = new PrintStream( System.out, true, "IBM437" );

everything made sense: opening the zip file with IBM437 character encoding, and using the new PrintStream, é was printed correctly.

There Ain’t No Such Thing As Plain Text.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This question concerns a Tomcat 7 web application, which is connected to a MySQL

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply