Let say I want to extract data from a web page with the following markup:
<table>
<tr>
<td><a href="Link 1">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
<tr>
<td><a href="Link 2">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
...
</table>
to JSON format :
[
{
link: 'Link 1',
text: 'Column 1 Text',
data: 'Column 3 Text'
},
{
link: 'Link 2',
text: 'Column 1 Text',
data: 'Column 3 Text'
}
]
Can we make it with YQL? If yes then please give me an example query.
Any helps would be appreciated!
Here’s a query that’s a good starting point, using the HTML table along with some XPath query (see Extracting HTML Content With XPath for more details on this technique):
select * from html where url="http://cantoni.org/test/table.html" and xpath='//table/tr'Which produces JSON results like this: