I’ve created a Java application that parses a text file to extract fields that are being loaded to a data table. We’re discovering some exception processing where the table can’t accept special characters, specifically  and the like.
These characters appear in the input file as spaces when I look at it, but Java interprets them differently. I suspect it’s a character code interpreted differently.
My question is this: in order to filter out these characters, is there any way I can generate a list of what Java is seeing? I’m thinking of printing the CHAR and the character code, and if possible, the character ~set~ (ASCII, ANSI, UTF-8, etc). From that, I could substitue a space for the character in my ending file and solve my problem.
Is there a simpler solution I’m not seeing?
It sounds like you are crossing character sets or your input files have some kind of control character sequence in it. You should focus your efforts on that side of it and ensure you are working in the proper character set. The only way I can think of to roll up a list of the characters in a file is an array and loop the file.
If you really want to strip all that stuff out, check out this thread
Regular expression for excluding special characters
it explains how to white and blacklist characters with regex.