I’ve got a Django-based website using a PostgreSQL database hosted on Webfaction. I normally manually collect the data for my database (copy-paste into a text file) from another website which lists all of the data on a single web page in an HTML table.
As far as automatically gathering that data with Python, I’m guessing that I should use something like html5lib or Scrapy to write a script that loads the web page, finds the HTML table I want, extracts the data from it, formats it into JSON, and then uses
manage.py loaddata fixturename.json
to load my data into my database. My question, though, is how do I get this script to run automatically once a day on Webfaction’s server?
You can use cron to schedule tasks.
Your crontab file could look something like this:
(Or you can use
@daily /usr/bin/python manage.py loaddata fixturename.jsonto run at midnight every night)See the webfaction documentation: http://docs.webfaction.com/software/general.html#scheduling-tasks-with-cron