I was trying to parse a page that contain scientific notation (Greek, etc).
This is the page. Note that there are other pages with more notations to be parsed.
For example it contain the following HTML
<td> human Interleukin 1β </td>
where &beta encode the Greek alphabet.
However after parsing with HTML::TableExtract it became:
human Interleukin 1\x{3b2}
Is there a way to make the code below capture the original HTML as it is,
i.e. maintaning 1&beta.
use HTML::TableExtract;
use Data::Dumper;
# Local file for http://www.violinet.org/vaxjo/vaxjo_detail.php?c_vaxjo_id=55
my $file = "vaxjo_detail.php\?c_vaxjo_id\=50.html";
my $te = HTML::TableExtract->new();
$te->parse_file($file);
my ($table) = $te->tables;
print Dumper $table ;
It did not return
It returned
Dumper simply prints that out as Perl string literal
Anyway, if you want the raw HTML instead of the text it represents, I believe passing
keep_html => 1to the constructor will do the trick.