Currently user can upload files as they like. So in the uploaded files there are spaces, characters like ß, ü and so on. Than other users can download these files (including white spaces in the URL and so on). It works in this way but according to RFC1738 – Uniform Resource Locators (URL) only alphanumeric characters [a-zA-Z0-9] and some special/reserved characters are allowed. Also empty spaces should be avoided I think.
Currently I get for a ß a ß in the file name on the server. The user who wants to download the file gets the correct character (ß) represented from the MySQL database (utf8_unicode_ci) and so the file can be found on the server.
- What is the correct way to handle file names?
- Should I make a filename check and disallow the upload?
- Should I rename the files on the server after the user upload (e.g.
str_replace(),urlencode(), …)?
As long as your webserver takes care of handling the file downloads, ensure that it knows about the encoding on the file-system and the file-system is compatible to the charset you use for the file-names of the uploads you handle.
As long as everything is compatible here (it looks like you use UTF-8), you won’t run into any problems. Just ensure the encoding is set right @ every place you make use of (file-system, webserver, data-base server, database-client-connection, browser, upload POST request, file-link-offering HTTP HTML response etc.).
If you intend to serve the files by PHP with the
Content-Dispositionheader you should only allow the followinig character within file-names:That’s because that header has no working specification for characters outside of the US-ASCII printable range.
Normally when a file is uploaded, it’s filename get’s normalized. It’s also wise to do some validation / sanitizing at the point of upload.