I am trying to parse an HTML-table into LaTeX code (using longtabu as it supports custom column width settings) in a Java program I am writing. My code was running quite stable and the output seemed quite OK until just now. I have to support the table’s colspan-feature (I am skipping rowspan for now) and that is where the problem lies. The table that is causing problems looks something like this:
<table>
<tr>
<td width="385" colspan="3">
Content
</td>
<td width="359" colspan="3">
Content
</td>
<td width="151">
Content
</td>
</tr>
<tr>
<td width="24">
Content
</td>
<td width="361" colspan="2">
Content
</td>
<td width="359" colspan="3">
Content
</td>
<td width="151">
Content
</td>
</tr>
<tr>
<td width="24">
Content
</td>
<td width="276">
Content
</td>
<td width="85">
Content
</td>
<td width="198" colspan="2">
Content
</td>
<td width="161">
Content
</td>
<td width="151">
Content
</td>
</tr>
I identified the problem in the fact, that none of the table rows defines all of the column-widths.
In my understanding I would need a system of linear equations to solve the calculation of the width of the single columns… am I right or have a I missed something?
What would be the best approach to solve such an equations system in Java?
Assuming that the source table is not over constrained, underconstraint, nor inconsistently constrained, I would recommend:
This is an n-squared (or worse) algorithm, but should be fine as long as the table does not have ten thousand rows or columns. If the table is correctly constrained you will reach a point where all the column widths are defined. The advantage of a brute-force algorithm such as this is that it is relatively easy to debug and should be stable.
If the table is under-constrained, you reach a point where you make a pass, and there remain uncalculated column widths. If you want to handle this, you add another pass, and this time take an arbitrary constraint that involves the uncalculated table column, which also must include one or more other uncalculated table columns, and allocate the remaining space equally across all the uncalculated columns in the constraint. Since this is an arbitrary constraint, you may get a different answer on different runs … but the table is under constrained … does it matter?
When done, you have a complete fact table with all the column widths, and you can then generate your LaTeX code with all table columns specified.