I need to handle strings in my php script using regular expressions. But there

Question

0

Asked: June 1, 20262026-06-01T04:58:18+00:00 2026-06-01T04:58:18+00:00

I need to handle strings in my php script using regular expressions. But there

0

I need to handle strings in my php script using regular expressions. But there is a problem – different strings have different encodings. If string contains just ascii symbols, mb_detect_encoding function returns ‘ASCII’. But if string contains russian symbols, for example, mb_detect_encoding returns ‘UTF-8’. It’s not good idea to check encoding of each string manually, I suppose.
So the question is – is it correct to use preg_replace (with unicode modifier) for ascii strings? Is it right to write such code preg_replace ("/[^_a-z]/u","",$string); for both ascii and utf-8 strings?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T04:58:20+00:00

This would be no problem if the two choices were “UTF-8” or “ASCII”, but that’s not the case.

If PHP doesn’t use UTF-8, it uses ISO-8859-1, which is NOT ASCII (it’s a superset of ASCII in that the first 127 characters . It’s a superset of ASCII. Some characters, for example the Swedish ones å, ä and ö, can be represented in both ISO-8859-1 and Unicode, with different code points! I don’t think this matter much for preg_* functions so it may not be applicable to your question, but please keep this in mind when working with different encodings.

You should really, really try to know which character set your strings are in, without the magic of mb_detect_encoding (mb_detect_encoding is not a guarantee, just a good guess). For example, strings fetched through HTTP does have a character set specified in the HTTP header.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to handle strings in my php script using regular expressions. But there

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply