I have documents with HTML Tables. Some of the cells have only numbers. Other cells have numbers and words.
Is there any way to keep just the contents of the cells that have words and not keep the contents of cells with only numbers?
Is there a module that anyone is aware of that I could use to do this? Alternatively, is there anyway I could use a regular expression?
<table>
<tr>
<td>WORDS WORDS WORDS WORDS WORDS WORDS 123</td>
<td> 789</td>
</tr>
<tr>
<td> 123 </td>
<td>WORDS WORDS</td>
</tr>
</table>
I am still pretty new to perl, so please excuse my question if it is very simple. Also, I have already been warned about the potential problems of parsing HTML text using a regular expression.
Thanks so much!
Eventually, I’ll use a module to kill all of the HTML code, by the way.
As you already stated, HTML should not be parsed with regular expressions. A specialised parsing module like
HTML::Parsercan be of help:Output: