Looking at HTML source code of
I see that Google never closes td and tr tags. There is no </tr> no </td> in the source.
Why?
<tr class=bb>
<th class="bb lm">Date
<th class="rgt bb">Open
<th class="rgt bb">High
<th class="rgt bb">Low
<th class="rgt bb">Close
<th class="rgt bb rm">Volume
<tr>
<td class="lm">Nov 26, 2010
<td class="rgt">11,183.50
<td class="rgt">11,183.50
<td class="rgt">11,067.17
<td class="rgt">11,092.00
<td class="rgt rm">68,396,121
<tr>
Is it to make it harder to parse it because XML parser won’t be able to read it ? I have remarked that &output=csv is not available for indices (this url won’t work: http://www.google.com/finance?q=INDEXDJX:.DJI&output=csv) whereas it is available for stock (http://www.google.com/finance/historical?q=NASDAQ:GOOG&output=csv will work) so that to get historical data in csv for indices you have to do the parsing job !
This is HTML4 (and not XML). As pointed out in the W3 specs:
Ditto for
tr:I believe the intent is to minimize page size by omitting the end tags. They do various additional optimizations which may actually result in invalid HTML, but are handled by browsers in tagsoup mode.