I have a loosely structured XHTML data and I need to convert it to better structured XML.
Here’s the example:
<tbody>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
<td>Green</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td>Red</td>
<td>Round shaped</td>
<td>Bitter</td>
</tr>
<tr>
<td>Pink</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
<td>Red</td>
<td>Heart shaped</td>
<td>Super tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
<td>Yellow</td>
<td>Smile shaped</td>
<td>Fairly tasty</td>
</tr>
<tr>
<td>Brown</td>
<td>Smile shaped</td>
<td>Too sweet</td>
</tr>
I am trying to achieve following structure:
<data>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.
Next I need to populate all the entries under each category while including those values (like shown above).
But… As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100…
I’ve tried a few approaches but just can’t seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.
In this case, you are not actually grouping elements. It is more like ungrouping them.
One way to do this is to use an xsl:key to look up the “header” row for each of detail rows.
i.e For each detail row, get the most previous header row.
Next, you can then match all your header rows like so:
Within the matching template, you could then extract the type, country and rank. Then to get the associated detail rows, it is a simple case of looking at the key for the parent row:
Here is the overall XSLT
When applied to your input document, the following output is generated: