I cannot pass a UTF-8 filename to move_uploaded_file() as it gets converted bytewise, resulting in a faulty name in the file system. For example:
move_uploaded_file($_FILES['userfile']['tmp_name'], '\upload\é.jpg');
creates xa9.jpg in the upload directory.
While the Windows API supports UTF-16, passing such filename (e.g., iconv('UTF-8', 'UTF-16', 'é')) to move_uploaded_file() results in an error.
It would be reasonable to percent-encode all special characters, and I definitely should do the same with the URIs, according to RFC 3986. But when I use percent-encoded URIs, Apache gives a 404 error, as it decodes the URL and can’t find anything by that name.
For example: <img src="/upload/%C3%A9.jpg" /> gives the Apache error:
File does not exist: […]/upload/\xc3\xa9.jpg.
What would be the proper solution? If I rename the file in Windows (é.jpg), the encoded HTML URI (%C3%A9.jpg) works as expected.
Some info on the subject: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php
Passing
iconv('UTF-8', 'Windows-1250', $_FILES['userfile']['name'])tomove_uploaded_file(), as opposed to using UTF-16, and saving the filename for HTML asrawurlencode($_FILES['userfile']['name'])works.If this filename is stored in a database, any file request should refer to
iconv('UTF-8', 'Windows-1250', rawurldecode($filename)).I use Windows-1250 charater set as this is the default on my system.
Additional info on MSDN:
Character Sets Used in File Names (See: Code Pages)
File and Directory Names (Naming Conventions)