At the moment I’m working on a system that caches dynamically created files, nothing too new there. Basically how it works is that it will go through the $_GET array and will create a directory structure based on querystring variables. Off course it’s a little more secure than that, but you get the picture.
The problem with this is that anyone could surf to for example:
http://www.example.com/?page=foo&page2=bar (which would generate /cachefiles/foo/bar.html)
and change the page2 value randomly each time, thus creating a new cachefile.
So someone could just make 10000 requests with random querystrings and there would be new cachefiles generated for each and every single one of them.
An extra problem is that I actually want to allow this in a way, as it’ will be part of a CMS which people should be able to write plugins for and thus use their own querystring variables. As the querystrings are usually vital to the content of the page, there should be a new chachefile for different querystrings.
So basically I want to allow the use of querystrings in this context, but prevent abuse of them.
To me it sounds rather impossible to have one but not the other, but as I’m no guru I’m hoping some of you could perhaps share your thoughts on this and adice on best practises (and / or alternative methods).
I do know there’s a few cache libraries out there, but I prefer to do things on my own so I understand what’s going on and how stuff works.
EDIT: Thanks for your replies everyone. I think I will try to combine some of your suggestions into one system. So I will add the functionality for plugin developers to register their querystring variables into the API, as well as do some more profound checks. However, please consider following scenario:
User requests a static page, but in the template of it a plugin-calendar is loaded: eg. example.com/?page1=static&calenderStartmonth=5
Here’s the root of the problem: calendarStartMonth can be one of 12 and does need to be able to change, so caching would need to re-occur every time calendarStartMonth changes. If the plugin developer does not check the input, and someone uses a random nr for 10000 requests, that would overflow the cache, would it not? Off course I realise it’s not very effective to cache the page again everytime the calenderMonth changes, that’s why I asked for some best practice advice. Would I have to come up with a system which caches all but the plugins? Thanks again for your answers.
I would recommend you write some small API to let the CMS and the plugin developers register pages in the database they are going to create (read: allow).
This way you also solve the problem of cache-flushing as you really know what is about to be deleted and when.