I have an html page with many tables.
<html>
<table>
POINTER_TEXT
some other stuff
<table that i want START>
</table that i want END>
some other stuff
<table bad>
</table bad>
</table>
</html>
I wish to grab a table that comes after a specific text. I am good until this stage.
curl -silent http://xyz.com/1.htm | sed -n '/POINTER_TEXT/,$p'
This gives me
POINTER_TEXT
some other stuff
<table that i want START>
</table that i want END>
some other stuff
<table bad>
</table bad>
</table>
</html>
Then I add this:
curl -silent http://xyz.com/1.htm | sed -n '/POINTER_TEXT/,$p' | sed -n '/<table*/,/<\/table>/p'
which gives me this:
<table that i want START>
</table that i want END>
<table bad>
</table bad>
My problem is I just need this:
<table that i want START>
</table that i want END>
Help me please guys!
Add
at the end. This should throw away everything after the first table end.
But, what will your script do if there are no newlines in the html? It is far more robust to use a real parser to process HTML.