I often find myself designing simple little web projects that are serving up aggregate content or doing a ‘mashup’. Typically this involves running a script to scrape/parse/manipulate some data periodically, then serving that as ‘static’ content.
I run the ‘refresh’ script as a cron job that generates HTML that is served up to the end-user. The content doesn’t change that often so I can usually just run the cron job on an hourly basis.
Is there a better way to do this?
If you are happy with how it’s working now, I wouldn’t change anything. It is a kludge, but a functional one. But I’m guessing you’re not completely happy (otherwise you wouldn’t have asked) so a more substantial answer follows.
A basic upgrade would be to write a script that polls your mashup sources and generates the HTML on-the-fly. The mashup sources could be anything from remote web servers, to local files, to local databases – anything you can “connect to” in code. The basic steps would be:
1 & 2 sound like basically what you’re already doing. It’s just #3 that is the missing link. You basically want to dynamically generate the output on-the-fly instead of pregenerating it and sending out static HTML.
Languages well-suited for this sort of thing include PHP, Perl, Ruby, Python, and others; take your pick.
Further optimizations – in the order you’d probably want to do them – include: