EDIT: The key point is about getting the first 200 characters and closing all tags, that are left open.
I’m currently loading articles from other website via DOM object (the other website have no RSS). I want to make a “preview” of that article, but here are my problems:
-
I do not control how the articles are written, therefore it seems to always be inside a table on the second TR (they use a CMS and it is messy – see example below).
-
They have A LOT of HTML tags, I don’t want to leave one open.
-
I need to keep the HTML format, I know it’s ugly but it fits perfectly in my page.
Not my code (in French sorry):
<table>
<TR >
<TD class='Normal' valign="top" colspan="2" style="padding-bottom:15px;">13-01-2012 <b>Water-polo – Championnat pan-pacifique<b ></TD>
</TR><TR >
<TD class='Normal' valign="top"><span class="HeadTitleNews"> Les Canadiennes disputeront le bronze aux Chinoises</span> <img src='http://www.sportcom.qc.ca/Portals/0/2011WaterpoloF.jpg' width='165' align='right' class='imgAnnouncementCss'><div style="margin: 0in 0in 0pt"><span style="font-family: Tahoma; font-size: 10pt">Montréal, 13 janvier 2012 (Sportcom) – L’équipe féminine canadienne de water-polo a remporté une victoire écrasante de 19-3 face au Brésil, vendredi, au Championnat pan-pacifique de Melbourne, en Australie. Les Canadiennes se sont ainsi assurées de participer au match pour la médaille de bronze contre les Chinoises. </span></div>
<div style="margin: 0in 0in 0pt"> </div>
<div style="margin: 0in 0in 0pt"><span style="font-family: Tahoma; font-size: 10pt">La Montréalaise <strong>Sophie</strong></span><strong><span style="font-family: Tahoma; font-size: 10pt"> Baron-La Salle</span></strong><span style="font-family: Tahoma; font-size: 10pt"> a marqué quatre buts dans la victoire. </span></div>
<div style="margin: 0in 0in 0pt"> </div>
Thanks.
Close open HTML tags in a string