I am trying to use JS (prefered) or PHP to access APIs like StackOverflow, Tumblr & Forrst to get my latest posts to display in my blog. So I will need a way to truncate the HTML returned, so that it fits into a “widget” sized space.
How might I do it with JS or PHP? It should
- not truncate creating invalid HTML
- not truncate words (leaving half a word for example)
I am also considering stripping out code blocks or images that otherwise may not fit well. But this is secondary
Well, as I guess, when you truncate a piece of code, you should be careful not to break its workings [in case of HTML, make sure all opening and closing tags remain intact], of course, if you are considering to keep those code blocks. This will require good piece of code heavily loaded with Reg-ex, and I doubt it would be a good idea to achieve this goal with Jscript – PHP would be much faster and safer way…
On the other hand, if you are considering getting rid of all code blocks, first use striptags() function of PHP [you can add
<img>as a second parameter to it to keep IMG tags] like:And then truncate your code making sure you are not damaging closing “>” characters of tags. Again, Reg-ex will do the job: just use Reg-ex conditionals and look-forwards, -behinds to achieve that goal.
Once you’re done with tags, it’s time to make sure you are not damaging your Multi-byte characters: using truncate without control, might corrupt multi-byte characters by splitting their bytes apart. To achieve this try using PHP’s mb_substr() function. As you are doing this truncation, you might wish to make your code not count the remaining HTML tags in it as characters – using Reg-ex, you can temporarily replace them with placeholders, once truncation is done, place the original values back in.
So, “simply” put: It requires good command of PHP and some coding, which is hard to post here, I am afraid.