I have documents with HTML Tables. Some of the cells have only numbers. Other

Question

0

Asked: June 9, 20262026-06-09T21:32:04+00:00 2026-06-09T21:32:04+00:00

I have documents with HTML Tables. Some of the cells have only numbers. Other

0

I have documents with HTML Tables. Some of the cells have only numbers. Other cells have numbers and words.

Is there any way to keep just the contents of the cells that have words and not keep the contents of cells with only numbers?

Is there a module that anyone is aware of that I could use to do this? Alternatively, is there anyway I could use a regular expression?

<table>
<tr>
<td>WORDS WORDS WORDS WORDS WORDS WORDS 123</td>
<td> 789</td>
</tr>
<tr>
<td> 123 </td>
<td>WORDS WORDS</td>
</tr>
</table>

I am still pretty new to perl, so please excuse my question if it is very simple. Also, I have already been warned about the potential problems of parsing HTML text using a regular expression.

Thanks so much!

Eventually, I’ll use a module to kill all of the HTML code, by the way.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T21:32:05+00:00

As you already stated, HTML should not be parsed with regular expressions. A specialised parsing module like HTML::Parser can be of help:

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::Parser;

my $p = HTML::Parser->new( 'text_h' => [ \&text_handler, 'dtext' ] );
$p->parse_file(\*DATA);

sub text_handler {
    my $text = shift;
    $text =~ s/^\s*|\s*$//g;         # Trim leading and trailing whitespaces
    return if !$text || $text =~ /^[\d\s]+$/;

    print "$text\n";
}

__DATA__
<table>
<tr>
<td>WORDS WORDS WORDS WORDS WORDS WORDS 123</td>
<td> 789 558 </td>
</tr>
<tr>
<td> 123 </td>
<td>WORDS WORDS</td>
</tr>
</table>

Output:

WORDS WORDS WORDS WORDS WORDS WORDS 123
WORDS WORDS

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have documents with HTML Tables. Some of the cells have only numbers. Other

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply