I’m trying to understand the basics of practical programming around character encodings.
A few things to consider:
- I know how to read a file whose encoding is different, and convert it to the console’s encoding.
- But when I try to convert literal strings that appear in source code, for some reason, it doesn’t always work:
- In IntelliJ’s console for the
clojurelanguage (its REPL or interactive interpreter), it doesn’t work at all. I haven’t look if this particular console is different than IntelliJ’s standard java console. - In Apple’s Terminal, it sometimes works fine, depending on the source file’s encoding.
- In Eclipse and Netbeans, it always works fine.
- In IntelliJ’s console for the
There’s lots of resources to learn about Unicode and character encodings. But AFAIK, not much to learn practical usage guidelines. Some other questions here on StackOverflow have been useful, but none has been enough for what I’m trying to do.
UPDATE: I have greatly simplified this question after having understood how general the problems I was facing were. Originally, it was specifically targeted at the Java platform, with a code example in the clojure language. To see these, have a look at the first version of this question.
Your problem is related to how your IDE tells the Java compiler to interpret the source file’s encoding. (Console output might be another problem. don’t know)
If you run the javac program with no arguments you get a help print (excert below) that hints you as to how it works.
Javac thus interprets the source file, with its literal strings and all, turning it into UTF8 i think in the byte code. I’m sure the Closure compiler has a similar option.
In Eclipse, the option to decide what encoding source files have is under General > Workspace > Text file encoding. Under my Swedish Windows machine, the selected default was CP1252. (I don’t care what’s there since i avoid using characters outside ASCII for exactly this reason.)