I am very lost with this Regex. I have a HTML Table with 3 Field:Date,Name and Place. The first record of table don’t have field “Place”(i cannot change table format)… At the moment i am using pattern below:
^<tr><td.*>(.+)<\/td><td>(.+)<\/td><td><font.*>(.+)<\/font><\/td><\/tr> $\n<tr><td.*>(.+)<\/td><\/tr>
This pattern ignores the first record of table(this record don’t have field “Place”). I don’t want create 2 Pattern for same text. Can anyone help with this issue?
A sample of table:
<table border cellpadding=1 hspace=10>
<colgroup style='font:8pt Tahoma;color=Black' valign=top><colgroup style='font:8pt Tahoma; color=Navy'><colgroup style='font:8pt Tahoma;color=Maroon'>
<tr>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Date</b></font></td>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Name</b></font></td>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Place</b></font></td>
</tr>
<tr><td rowspan=2>17/08/2011 10:28</td><td>Vivamus sed est ut lorem tempor cursus</td><td><FONT COLOR="000000">Curabitur egestas metus bibendum</font></td></tr>
<tr><td colspan=2>Curabitur id urna elit</td></tr>
<tr><td rowspan=2>17/08/2011 10:26</td><td>UDonec blandit nisl ut nisl elementum</td><td><FONT COLOR="000000"> hendrerit vel ante</font></td></tr>
<tr><td colspan=2>Etiam nec mollis</td></tr>
<tr><td rowspan=2>12/08/2011 09:46</td><td>Nulla et eros a massa</td><td><FONT COLOR="000000">Aenean in mauris eget tellus </font></td></tr>
<tr><td colspan=2>Nulla et eros a massa tristique blandit </td></tr>
<tr><td rowspan=2>12/08/2011 09:45</td><td>orta mi dapibus sit amet. Vestib</td><td><FONT COLOR="000000"> mollis erat consectetur.</font></td></tr>
<tr><td colspan=2>sodales tempor</td></tr>
<tr><td rowspan=1>11/08/2011 10:39</td><td>lorem ipsum</td><td><FONT COLOR="000000">dolor</font></td></tr>
</TABLE>
The current solution is create 2 regexp.
The first regex catch table without first record:
^<tr><td.*>(.+)<\/td><td>(.+)<\/td><td><font.*>(.+)<\/font><\/td><\/tr> $\n<tr><td.*>(.+)<\/td><\/tr>
And the second regex capture first record:
^<tr><td.*>(.+)<\/td><td>(.+)<\/td><td><font.*>(.+)<\/font><\/td><\/tr> $
More formally, XML and associated languages are not regular languages, which is why they are unsuited for parsing by regular expressions. Short of writing your own recursive descent parser, your best bet is to use an existing solution.