Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8707937
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T03:59:20+00:00 2026-06-13T03:59:20+00:00

I have a problem using File.list() with file names with NON-ASCII characters incorrectly retrieved

  • 0

I have a problem using File.list() with file names with NON-ASCII characters incorrectly retrieved on Mac OS X when using Java 7 from Oracle.

I am using the following example:

import java.io.*;
import java.util.*;

public class ListFiles {

  public static void main(String[] args) 
  {
    try { 
      File folder = new File(".");
      String[] listOfFiles = folder.list(); 
      for (int i = 0; i < listOfFiles.length; i++) 
      {
        System.out.println(listOfFiles[i]);
      }
      Map<String, String> env = System.getenv();
      for (String envName : env.keySet()) {
        System.out.format("%s=%s%n",
            envName,
            env.get(envName));
      }
    } catch (Exception e) { 
      e.printStackTrace(); 
    } 
  }

}

Running this example with Java 6 from Apple, everything is fine:

....
Folder-ÄÖÜäöüß
吃饭.txt
....

Running this example with Java 7 from Oracle, the result is as follows:

....
Folder-A��O��U��a��o��u����
������.txt
....

But, if I set the environment as follows (not set in the two cases above):

LANG=en_US.UTF-8

the result with Java 7 from Oracle is as expected:

....
Folder-ÄÖÜäöüß
吃饭.txt
....

My problem is that I don’t want to set the LANG environment variable. It’s a GUI application that I want to deploy as an Mac OS X application, and doing so, the LSEnvironment setting

<key>LSEnvironment</key>
<dict>
  <key>LANG</key>
  <string>en_US.UTF-8</string>
</dict>

in Info.plist takes no effect (see also here)

What can I do to retrieve the file names correctly in Java 7 from Oracle on Mac OS X without having to set the LANG environment? In Windows and Linux, this problem does not exist.

EDIT:

If I print the individual bytes with:

byte[] x = listOfFiles[i].getBytes();
for (int j = 0; j < x.length; j++) 
{
    System.out.format("%02X",x[j]);
    System.out.print(" ");
}
System.out.println();

the correct results are:

Folder-ÄÖÜäöüß
46 6F 6C 64 65 72 2D 41 CC 88 4F CC 88 55 CC 88 61 CC 88 6F CC 
88 75 CC 88 C3 9F 
吃饭.txt
E5 90 83 E9 A5 AD 2E 74 78 74 

and the wrong results are:

Folder-A��O��U��a��o��u����
46 6F 6C 64 65 72 2D 41 EF BF BD EF BF BD 4F EF BF BD EF BF BD 
55 EF BF BD EF BF BD 61 EF BF BD EF BF BD 6F EF BF BD EF BF BD 
75 EF BF BD EF BF BD EF BF BD EF BF BD  
������.txt
EF BF BD EF BF BD EF BF BD EF BF BD EF BF BD EF BF BD 2E 74 78 74 

So one can see that Files.list() replaces some bytes with UTF-8 “EF BF BD” = Unicode U+FFFD = Replacement Character, if LANG is not set (only Java 7 from Oracle).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T03:59:22+00:00Added an answer on June 13, 2026 at 3:59 am

    If everything else fails, create a wrapper for the JVM that sets the LC_CTYPE environment variable and then launches your application. OS X doesn’t care which program the plist tells it to run does it? It’s probably simplest to create this wrapper in shell script:

    #!/bin/bash
    export LC_CTYPE="UTF-8" # Try other options if this doesn't work
    exec java your.program.Here
    

    The problem is with the way Java – any version of Java, from either Apple or Oracle – reads the names of files from the file system. Names of files on the file system are essentially binary data, and they must be decoded in order to use them as String in Java. (You can read more about this issue in my blog.)

    The detection of the encoding varies from platform to platform and version to version, so this must be where Apple Java 6 and Oracle Java 7 differ: Java 6 detects correctly that the system is set to UTF-8, while Java 7 gets it wrong.

    Strangely though, when I try to reproduce the issue with the following program I find that both Java 6 and Java 7 correctly use UTF-8 to decode file names (they are printed correctly to the terminal). For other I/O, Java 6u35 is using MacRoman as the default charset, while Java 7u7 uses UTF-8 (shown by the file.encoding system property).

    import java.io.*;
    
    public class Test {
      public static void main(String[] args) {
        System.setOut(new PrintStream(System.out, true, "UTF-8"));
        System.out.println(System.getProperty("file.encoding"));
        for (File f: new File(".").listFiles) {
          System.out.println(g.getName());
        }
      }
    }
    

    When I run locale on OS 10.7 I get this output. It seems that on my system Java 6 doesn’t interpret correctly the value given for LC_CTYPE. As far as I know the system has no customizations and everything is set to English, so this should be the default configuration:

    LANG=
    LC_COLLATE="C"
    LC_CTYPE="UTF-8"
    LC_MESSAGES="C"
    LC_MONETARY="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_ALL=
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem with using the SAX parser to parse a XML file.
i have problem using LIKE structure in DB2 : for example: select * from
I have a problem using the SSIS. I try to import data from database
I have a list of label names in a text file I'd like to
I have some JSF-trouble using h:selectOneMenu with a list from my backend bean: My
In my program I am passing a list of file names from command-line to
I've a problem of downloading arabic attachment files using java mail. The file name
I have problem using arabic font for iOS. All fonts have the same render,
I have problem while using jquery maskedinput with asp.net textbox. I have a check
I have a problem using a local SQL Server CE database with C# and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.