I’m creating a php website in which I have a document.php that basically takes a did (document ID) and displays it. An example would be mysite.com/document.php?did=1 which grabs the document content from database with a query like SELECT * FROM documents WHERE id=1. I know the security issues with this approach (such as SQL Injection) and I both do both validation and escaping but one thing that some guy told me was that it can be pretty easy to create a crawler that does something like:
for(int i = 0; i < 3000; ++i)
DownloadPage("mysite.com/document.php?did="+i);
Now I have 2 questions.
- Is this really an issue with the code that I’ve written or is there another solution for this? For instance, I know that I can tell Apache server to limit the bandwidth usage for one IP. (Or maybe a better alternative presented by you.)
- If this is an issue, I have a solution in my mind. I think I should add another parameter to the page. Something like the hash for the content which will be checked against the DB to see if this is the correct URL.
One thing that I’ve seen around a lot is that sometimes some part of the title is appended to the URL. Something like this:
mysite.com/document/1/some_part_of_the_url but I’ve checked and if I remove the title and go to mysite.com/document/1 it will still show the same webpage. This made me think that this is not for security reasons and is more like a way to help the user find out what is the title of the page he’s going to.
The reason that the title is generally appended to the URL is for search engine optimisation.
Are the documents supposed to be secure? If so, you need to implement some kind of authentication. Security by obscurity e.g. the user won’t guess the ID is not a good way to do it. You could quite easily implement a username/password, even if the username/password is baked into the code, and then use sessions to check the user is authenticated.
If the documents are not secured then I don’t really see any need to worry about authentication. Consider that on SO you could access questions by just going to stackoverflow.com/questions/#{id}.