I’ve been doing web development for a few months now and keep having this nagging problem. It is typical for pages to request content with a query string which usually contains meaningful data such as an id in the database. An example would be a link such as:
http://www.example.com/posts?id=5
I’ve been trying to think of a good strategy to prevent users from manually entering a value for the id without having accessed it from a link–I’d only wish to acknowledge requests that were made by links presented on my website. Also, the website may not have an authentication system and allows for anonymous browsing; that being said, the information isn’t particularly sensitive but still I don’t like the idea of not being able to control access to certain information. One option, I suppose, would be to use HTTP POST requests for these kind of pages — I don’t believe a user can simulate a post request but I may be wrong.
Furthermore, the user could place any arbitrary number for the id and end up requesting a record that doesn’t exist in the database. Of course, I could validate the requested id but then I would be wasting resources to accommodate this check.
Any thoughts? I’m working with django but a general strategy for any programming language would be good. Thanks.
First, choosing between GET and POST: A user can simulate any kind of request, so POST will not help you there. When choosing between the two it is best to decide based on the action the user is taking or how they are interacting with your content. Are they getting a page or sending you data (a form is the obvious example)? For your case of retrieving some sort of post, GET is appropriate.
Also worth noting, GET is the correct choice if the content is appropriate for bookmarking. Serving a URL based solely on the referrer — as you say, “prevent users from manually entering a value for the id without having accessed it from a link” — is a terrible idea. This will cause you innumerable headaches and it is probably not a nice experience for the user.
As general principle, avoid relying on the primary key of a database record. That key (id=5 in your case) should be treated purely as an auto-increment field to prevent record collisions, i.e. you are guaranteed to always have a unique field for all records in the table. That ID field is a backend utility. Don’t expose it to your users and don’t rely on it yourself.
If you can’t use ID, what do you use? A common idiom is using the date of the record, a slug or both. If you are dealing with posts, use the published/created date. Then add a text field that will hold URL friendly and descriptive words. Call it a slug and read about Django’s models.SlugField for more information. Also, see the URL of an article on basically any news site. Your final URL will look something like
http://www.example.com/posts/2012/01/19/this-is-cool/Now your URL is friendly on the eyes, has Google-fu SEO benefits, is bookmark-able and isn’t guessable. Because you aren’t relying on a back-end database fixed arbitrary ID, you have the freedom to…restore a backup db dump, move databases, change the auto-increment number ID to a UUID hash, whatever. Only your database will care, not you as a programmer and not your users.
Oh and don’t over-worry about a user “requesting a record that doesn’t exist” or “validating the requested id”…you have to do that anyway. It isn’t consuming unnecessary resources. It is how a database-backed website works. You have to connect the request to the data. If the request is incorrect, you 404. Your webserver does it for non-existent URLs and you’ll need to do it for non-existent data. Checkout Django’s get_object_or_404() for ideas/implementation.