I am trying to scrape an html table and save its data in a database. What strategies/solutions have you found to be helpful in approaching this program.
I’m most comfortable with Java and PHP but really a solution in any language would be helpful.
EDIT: For more detail, the UTA (Salt Lake’s Bus system) provides bus schedules on its website. Each schedule appears in a table that has stations in the header and times of departure in the rows. I would like to go through the schedules and save the information in the table in a form that I can then query.
Here’s the starting point for the schedules
It all depends on how properly your HTML to scrape is? If it’s valid XHTML, you can simply use some XPath queries on it to get whatever you want.
Example of xpath in php: http://blogoscoped.com/archive/2004_06_23_index.html#108802750834787821
A helper class to scrape a table into an array: http://www.tgreer.com/class_http_php.html