Sometime during the dark ages a script was built that outputs the following html..
...
<TABLE BORDER=0 FRAME=ALL_FRAMES RULES=ALL_RULES ALIGN=CENTER BGCOLOR="ffffe5">
<CAPTION ALIGN=TOP>
<FONT COLOR=009594 SIZE=-1><B>Access Information</B></FONT>
</CAPTION>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Access Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT 111**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Other Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
 
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT AAA**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
 
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT BBB**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
 
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT CCC**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
 
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
 
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Customer:</B></FONT>
</TD>
...
Sorry, I would show you the table layout but I don’t know how without <table> on SO
How can I use XPATH (in PHP) to collect only each DATA TO COLLECT section? So far I’ve been able to retrieve the first row with //*[*='Access Circuit(s):']/following-sibling::td[1].
Things to note:
- This is only a small section of a large document.
- I cannot change the scripts output.
- I wont know how many rows there will be (figure 0 to 6).
- The data should be expected to always be in the same “column”.
- I may only have XPATH version 1. But version 2 answers are still welcomed.
The expression I came up with is this:
returns
It uses the knowledge that the first row contains
Access Circuit(s):and the first uncollected row containsCustomer:. If you can’t be sure of either one of those, then I think it can’t be done with a single XPath expression.