I have a simple site which allows users to upload files (among other things obviously). I am teaching myself php/html as I go along.
Currently the site has the following traits:
–When users register a folder is created in their name.
–All files the user uploads are placed in that folder (with a time stamp added to the name to avoid any issues with duplicates).
–When a file is uploaded information about it is stored in an SQL database.
simple stuff.
So, now my question is what steps do I need to take to:
- Prevent google from archiving the uploaded files.
- Prevent users from accessing the uploaded files unless they are logged in.
- Prevent users from uploading malicious files.
Notes:
I would assume that B, would automatically achieve A. I can restrict users to only uploading files with .doc and .docx extensions. Would this be enough to save against C? I would assume not.
There is a number of things you want to do, and your question is quite broad.
For the Google indexing, you can work with the /robots.txt. You did not specify if you also want to apply ACL (Access Control List) to the files, so that might or might not be enough. Serving the files through a script might work, but you have to be very careful not to use include, require or similar things that might be tricked into executing code. You instead want to open the file, read it and serve it through File operations primitives.
Read about “path traversal”. You want to avoid that, both in upload and in download (if you serve the file somehow).
The definition of “malicious files” is quite broad. Malicious for who? You could run an antivirus on the uplaod, for instance, if you are worried about your side being used to distribute malwares (you should). If you want to make sure that people can’t harm the server, you have at the very least make sure they can only upload a bunch of filetypes. Checking extensions and mimetype is a beginning, but don’t trust that (you can embed code in png and it’s valid if it’s included via include()).
Then there is the problem of XSS, if users can upload HTML contents or stuff that gets interpreted as such. Make sure to serve a content-disposition header and a non-html content type.
That’s a start, but as you said there is much more.