After cleaning a folder full of HTML files with TIDY, how can the tables

Question

0

Editorial Team

Asked: May 10, 20262026-05-10T15:07:03+00:00 2026-05-10T15:07:03+00:00

After cleaning a folder full of HTML files with TIDY, how can the tables

0

After cleaning a folder full of HTML files with TIDY, how can the tables content be extracted for further processing?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T15:07:03+00:00

Depends on what sort of processing you want to do. You can tell Tidy to generate XHTML, which is a type of XML, which means you can use all the usual XML tools like XSLT and XQuery on the results.

If you want to process them in Microsoft Excel, then you should be able to slice the table out of the HTML and put it in a file, then open that file in Excel: it will happily convert an HTML table in to a spreadsheet page. You could then save it as CSV or as an Excel workbook etc. (You can even use this on a web server — return an HTML table but set the Content-Type header to application/ms-vnd.excel: Excel will open and import the table and turn it in to a spreadsheet.)

If you want CSV to feed in to a database then you could go via Excel as before, or if you want to automate the process, you could write a program that uses the XML-navigating API of your choice to iterate of the table rows and save them as CSV. Python’s Elementtree and CSV modules would make this pretty easy.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

After cleaning a folder full of HTML files with TIDY, how can the tables

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply