Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8433433
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T06:23:04+00:00 2026-06-10T06:23:04+00:00

I am experimenting with an edge case we’re seeing in production. We have a

  • 0

I am experimenting with an edge case we’re seeing in production. We have a business model where clients generate text files and then FTP them to our servers. We ingest those files and process them on our Java backend (running on CentOS machines). Most (95%+) of our clients know to generate these files in UTF-8 which is what we want. However we have a few stubborn clients (but large accounts) that generate these files on Windows machine with the CP1252 character set. No problem though, we’ve configured our 3rd party libs (which are what do most of the “processing” work for us) to handle input in any character set through some magical voo doo.

Occasionally, we see a file come over that has illegal UTF-8 characters (CP1252) in its name. When our software tries to read these files in from the FTP server the normal method of file reading chokes and throws a FileNotFoundException:

File f = getFileFromFTPServer();
FileReader fReader = new FileReader(f);

String line = fReader.readLine();
// ...etc.

The exceptions look something like this:

java.io.FileNotFoundException: /path/to/file/some-text-blah?blah.xml (No such file or directory) at java.io.FileInputStream.open(Native Method) at 
java.io.FileInputStream.(FileInputStream.java:120) at java.io.FileReader.(FileReader.java:55) at com.myorg.backend.app.InputFileProcessor.run(InputFileProcessor.java:60) at 
java.lang.Thread.run(Thread.java:662)

So what I think is happening is that because the file name itself contains illegal chars, we never even get to read it in the first place. If we could, then regardless of the file’s contents, our software should be able to handle processing it correctly. So this is really an issue with reading file names with illegal UTF-8 chars in them.

As a test case, I created a very simple Java “app” to deploy on one of our servers and test some things out (source code is provided below). I then logged into a Windows machine and created a test file and named it test£.txt. Notice the character after “test” in the file name. This is Alt-0163. I FTPed this to our server, and when I ran ls -ltr on its parent directory, I was surprised to see it listed as test?.txt.

Before I go any further, here is the Java “app” I wrote for testing/reproducing this issue:

public Driver {
    public static void main(String[] args) {
        Driver d = new Driver();
        d.run(args[0]);     // I know this is bad, but its fine for our purposes here
    }

    private void run(String fileName) {
        InputStreamReader isr = null;
        BufferedReader buffReader = null;
        FileInputStream fis = null;
        String firstLineOfFile = "default";

        System.out.println("Processing " + fileName);

        try {
            System.out.println("Attempting UTF-8...");

            fis = new FileInputStream(fileName);
            isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
            buffReader = new BufferedReader(isr);

            firstLineOfFile = buffReader.readLine();

            System.out.println("UTF-8 worked and first line of file is : " + firstLineOfFile);
        }
        catch(IOException io1) {
            // UTF-8 failed; try CP1252.
            try {
                System.out.println("UTF-8 failed. Attempting Windows-1252...(" + io1.getMessage() + ")");

                fis = new FileInputStream(fileName);
                // I've also tried variations "WINDOWS-1252", "Windows-1252", "CP1252", "Cp1252", "cp1252"
                isr = new InputStreamReader(fis, Charset.forName("windows-1252"));
                buffReader = new BufferedReader(isr);

                firstLineOfFile = buffReader.readLine();

                System.out.println("Windows-1252 worked and first line of file is : " + firstLineOfFile);
            }
            catch(IOException io2) {
                // Both UTF-8 and CP1252 failed...
                System.out.println("Both UTF-8 and Windows-1252 failed. Could not read file. (" + io2.getMessage() + ")");
            }
        }
    }
}

When I run this from the terminal (java -cp . com/Driver t*), I get the following output:

Processing test�.txt
Attempting UTF-8...
UTF-8 failed. Attempting Windows-1252...(test�.txt (No such file or directory))
Both UTF-8 and Windows-1252 failed. Could not read file.(test�.txt (No such file or directory))

test�.txt?!?! I did some research and found that the “�” is the Unicode replacement character \uFFFD. So I guess what’s happening is that the CentOS FTP server doesn’t know how to handle Alt-0163 (£) and so it replaces it with \uFFFD (�). But I don’t understand why ls -ltr displays a file called test?.txt…

In any event, it appears that the solution is to add some logic that searches for the existence of this character in the file name, and if found, renames the file to something else (like perhaps do a String-wise replaceAll("\uFFFD", "_") or something like that) that the system can read and process.

The problem is that Java doesn’t even see this file on the file system. CentOS knows the file is there (test?.txt), but when that file gets passed into Java, Java interprets it as test�.txt and for some reason No such file or directory…

How can I get Java to see this file so that I can perform a File::renameTo(String) on it? Sorry for the backstory here but I feel it is relevant since every detail counts in this scenario. Thanks in advance!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T06:23:06+00:00Added an answer on June 10, 2026 at 6:23 am

    Welcome to the wonderful world of text encodings. You have several levels of problems and you need to sort each of them out individually.

    First, what is the file name on disk? Does it contain valid UTF-8 escape sequences or is it something else?

    The problem here is that you need the correct file name or the Windows file system simply won’t be able to find the file. On top of that, Windows might try to convert the illegal characters in the file name to Unicode \uFFFD so no matter what you try, you won’t be able to load the file (since there is no file with \uFFFD in it on the disk).

    How can that be? This happens because the mapping isn’t two-way. When Windows loads the file name from disk, it replaces test�.txt with test\uFFFD.txt and gives you that name. When you tell Windows to open test\uFFFD.txt, it won’t be able to find the file because there is no file with such a name (there is only test�.txt). There is no way for you to find out what the real name of the file is.

    Solutions? You can open a dos prompt and rename the file with a pattern ren test*.txt test.txt. Since the pattern matches only a single file, that will work. But you won’t be able to do the same from, say, the Windows Explorer because it also can’t find the file.

    Next step: FTP. FTP is a protocol for humans – it’s not suitable for automatic data exchange. Get rid of FTP. I don’t know how much that will cost you but it’s always worth it. Use SFTP, scp or FTAPI.

    One source of the problems could be that FTP transfers file names as ASCII. No umlauts are allowed in the FTP protocol … or rather, FTP doesn’t expect any. If you’re lucky, your FTP client will refuse to transfer the file but most simply bug out. But when they exist, FTP will just do … something. Whatever that might be. Usual effects here are that files with Unicode in the name are encoded twice as UTF-8 or Unicode is replaced with ? (\u003f).

    Or the Java FTP client could use new String( bytes ) to create a String from the FTP file name which would rape the poor bytes with your System’s default encoding – not pretty.

    Solutions:

    1. Use an FTP server which rejects files with illegal characters in their names or which replaces these characters to something that doesn’t confuse the file system / OS.
    2. Use an file system which properly handles files with strange names. That usually means to get rid of Windows on the Server.
    3. Make sure users can only upload into a single directory and that this directory can only contain a single file. That way, you can use a small shell script and patterns to rename it to something that you can read.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am experimenting with Dart. I have created an on click event like so:
im new to programming, i have been experimenting with cocos2d, heres the problem, i
Experimenting with Cocos2D collection detection and have some questions. First, some background: This is
I have been experimenting a lot with some glassy images, such as the one
I experimenting with Flex Styling, and I came across an alignment issue. I have
I´m experimenting a bit with css sprites and have a small problem wich I
I have been experimenting with woopra.com A web analytics tool. Which requires a piece
I'm currently experimenting with build script, and since I have an ASP.net Web Part
been experimenting with unordered list UL and i have seem varios examples of using
Recently I have been experimenting with Firebreath and developed a plugin in order to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.