on “twill” documentation page it is written:
By default, twill will run pages
through tidy before processing
them. This is on by default because
the Python libraries that parse
HTML are very bad at dealing with incorrect HTML, and will often
return incorrect results on “real
world” Web pages. To disable this
feature, set config do_run_tidy 0
But where is this tidy program located inside twill? I have downloaded “twill 0.9” and looked into “twill” folder contents – I just can’t find there such a file (or a module) that would be named “tidy”
twill uses the commandline version of tidy if installed on your system. the method that calls tidy to clean your code is located in the utils.py and named ‘
run_tidy‘. its called by the command ‘tidy_ok‘ which is defined in commands.pyif use_tidy is set to true (which it is by default) the
_cleanup_htmlmethod in ConfigurableParsingFactory calls therun_tidymethod