I’m using the following regex to strip out non-printing control characters from user input

Question

0

Asked: May 15, 20262026-05-15T20:53:03+00:00 2026-05-15T20:53:03+00:00

I’m using the following regex to strip out non-printing control characters from user input

0

I’m using the following regex to strip out non-printing control characters from user input before inserting the values into the database.

 preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $value)

Is there a problem with using this on utf-8 strings? It seems to remove all non-ascii characters entirely.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T20:53:04+00:00

Editorial Team

2026-05-15T20:53:04+00:00Added an answer on May 15, 2026 at 8:53 pm

Part of the problem is that you aren’t treating the target as a UTF-8 string; you need the /u modifier for that. Also, in UTF-8 any non-ASCII character is represented by two or more bytes, all of them in the range \x80..\xFF. Try this:

preg_replace('/\p{Cc}+/u', '', $value)

\p{Cc} is the Unicode property for control characters, and the u causes both the regex and the target string to be treated as UTF-8.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the following regex to strip out non-printing control characters from user input

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply