I am trying to determine in which column the name “Phone” appears, by checking the HTML of a web page.
The string in which I am doing the search looks like this :
<tr class="C1">
<td>Name</td>
<td>Address</td>
...
... < some more columns, but their number is not fixed >
...
<td>Phone</td>
...
... <more columns>
...
</tr>
Is it possible to determine using regular expressions ?
From the viewpoint of theoretical computer science: It is not possible, since tables could be nested; and regular expressions generally cannot cope with nested structures (you need a Typ-2-Grammer (Chomsky-Hierarchy), i.e. a Parser, to analyse the structure of a html-Text, it’s not Typ-3, i.e. regular).
From a practical viewpoint, however, if you assume, that the tables are not nested, you could use a RegEx to extract table rows (something like
<tr (?!</tr>)*</tr>), match the entries afterwards (something like<td (?!</td>)*</td>) to produce a List of columns and search that list for an Entry containing the string"Phone"….