How I can do the following block of code from perl to python?
while ($content2 =~ m{services</B></TD>\s<TD>\s<TABLE>\s<TR>(.*?)</TABLE>}gs) {
my $service=$1
print " service : $1\n";
}
The full code have more regex outputs than that, but with that example I’m able to proceed with the conversion.
If i’m reading your regex right, you have a table with a single row (and no data cell) inside a table-data cell. What kind of abomination are you cooking up here?
In python you’d use lxml for this. It’s a real parser, so it won’t fail when whitespace or casing or other unrelated structure of the document changes. It’s not a part of the standard library, but it’s one of (if not /the/) most-installed libraries on pypi.
As you can see, it deals with questionable html admirably.
If you need match the “services” text, you can add a condition like so:
Here are some good xpath references: