I’m using the following regex to check an image filename only contains alphanumeric, underscore,

Question

0

Asked: May 20, 20262026-05-20T17:48:57+00:00 2026-05-20T17:48:57+00:00

I’m using the following regex to check an image filename only contains alphanumeric, underscore,

0

I’m using the following regex to check an image filename only contains alphanumeric, underscore, hyphen, decimal point:

preg_match('!^[\w.-]*$!',$filename)

This works ok. But I have concerns about multibyte characters. Should I specifically handle them to prevent undetermined errors, or should this regex reject mb filenames ok?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T17:48:58+00:00

PHP does not have “native” support for multibyte characters; you need to use the “mbstring” extension^Docs (which may or may not be available). Furthermore, it would appear that there is no way to create a “multibyte-character string”, as such — rather, one chooses to treat a native string as multibyte-character string by using special “mbstring” functions. In other words, a PHP string does not know its own character encoding — you have to keep track of it manually.

You may be able to get away with it so long as you use UTF-8 (or similar) encoding. UTF-8 always encodes multibyte characters to “high” bytes (for instance, ß is encoded as 0xcf 0x9f), so PHP will probably treat them just like any other character. You would not be able to use an encoding that might potentially encode a multibyte character into “special” PHP bytes, such as 0x22, the “double-quote” symbol.

The only regular expression functions in PHP that know how to deal with specific multibyte characters out of a range of multiple character-sets are mb_ereg^Docs, mb_eregi^Docs, mb_ereg_replace^Docs and mb_eregi_replace^Docs.

PCRE based regular expression functions like preg_match^Docs support UTF-8 by using the u-modifier (PCRE8)^Docs.

But of course, as described above PHP strings don’t know their own encoding, so you first need to instruct the “mbstring” library using the mb_regex_encoding function. Note that that function specifies the encoding of the string you’re matching, not the string containing the regular expression itself.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the following regex to check an image filename only contains alphanumeric, underscore,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply