I am using some Japanese/French words in some regular expressions inside my source code. I don’t want to convert these into \u notation since tracking it would be difficult and might introduce bugs hard to catch.
Do we have any standard practice to deal with non-ASCII characters in source code or is it OK to use them as they are.
Thanks
It’s somewhat risky since the program behaviour now depends on the platform default encoding of the machine the program is compiled on, or the compiler arguments. And that makes for hard-to catch bugs, too.
If there are just a handful of such regexes, I’d prefer using the Unicode escapes. If there are a lot, I’d bite the bullet and use UTF-8 of the source code, but only after I have