I am using some Japanese/French words in some regular expressions inside my source code.

Question

0

Asked: May 15, 20262026-05-15T23:03:06+00:00 2026-05-15T23:03:06+00:00

I am using some Japanese/French words in some regular expressions inside my source code.

0

I am using some Japanese/French words in some regular expressions inside my source code. I don’t want to convert these into \u notation since tracking it would be difficult and might introduce bugs hard to catch.

Do we have any standard practice to deal with non-ASCII characters in source code or is it OK to use them as they are.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T23:03:07+00:00

It’s somewhat risky since the program behaviour now depends on the platform default encoding of the machine the program is compiled on, or the compiler arguments. And that makes for hard-to catch bugs, too.

If there are just a handful of such regexes, I’d prefer using the Unicode escapes. If there are a lot, I’d bite the bullet and use UTF-8 of the source code, but only after I have

A build script (and the app is built only with that script) that uses UTF-8 for the compilation
Some unit tests that confirm that the regexes are working correctly
An automated build server that runs the unit tests for every build

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using some Japanese/French words in some regular expressions inside my source code.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply