I have been developing a parser that takes JavaScript as input and creates a compressed version of that JavaScript as output.
I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.
As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.
Note that I need to use special European characters that include accents.
So, here are my questions:
- Should I use code page 1252 or UTF-8?
- Why does Visual Studio save files as UTF-8 by default?
- If I choose to save files as 1252 will that lead to problems?
- It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
UTF-8 is a better option as it really support all known characters, while with 1252 you might end up with characters that you need missing from it (even in European languages).
Apparently, VS2008 saves UTF-8 with a byte order mark – it should be possible to either switch that off, or have the parser recognize it, or strip the BOM somewhere in between.