I’m processing a source HTML file that holds tabular data in an unstructured way. Basically it’s a bunch of absolutely positioned divs. My goal is to rebuild some sort of structured XML data. So far, using XSLT 2.0 I was able to produce an XML looking like this:
<data>
<line top="44">
<item left="294">Some heading text</item>
</line>
<line top="47">
<item left="718">A</item> <!-- this item is a section-start -->
<item left="764">Section heading</item>
</line>
<line top="78">
<item left="92">Data</item>
<item left="144">Data</item>
<item left="540">Data</item>
<item left="588">Data</item>
</line>
<line top="101">
<item left="61">B</item> <!-- this item is a section-start -->
<item left="144">Section heading</item>
</line>
<line top="123">
<item left="92">Data</item>
<item left="144">Data</item>
</line>
</data>
However, what I need to do next is group lines into sections. Each section starts with a line whose first item’s value consists of a single letter A – Z. My approach is to hold all the <line> elements in a $lines variable and then use xsl:for-each-group with group-starting-with attribute to identify the element starting a new section.
The respective XSLT fragment looks like this:
<xsl:for-each-group select="$lines/line" group-starting-with="...pattern here...">
<section>
<xsl:copy-of select="current-group()"/>
</section>
</xsl:for-each-group>
The problem is I can’t figure out a working pattern to identify section starts. The best I could do was ensuring that //line/item[1]/text()[matches(., '^[A-Z]$')] works when used separately in an XPath evaluator. However, I can’t seem to derive a working version to be used with group-starting-with.
Update Hence the wanted result should look like this:
<data>
<section> <!-- this section started automatically because of being at the beginning -->
<line top="44">
<item left="294">Some heading text</item>
</line>
</section>
<section>
<line top="47">
<item left="718">A</item> <!-- this item is a section-start -->
<item left="764">Section heading</item>
</line>
<line top="78">
<item left="92">Data</item>
<item left="144">Data</item>
<item left="540">Data</item>
<item left="588">Data</item>
</line>
</section>
<section>
<line top="101">
<item left="61">B</item> <!-- this item is a section-start -->
<item left="144">Section heading</item>
</line>
<line top="123">
<item left="92">Data</item>
<item left="144">Data</item>
</line>
</section>
</data>
The solution:
The trick is really understanding that
group-starting-withshall be a pattern not a condition.