If you are accepting user submitted content which contains HTML how would you generate an auto excerpt (using PHP) but keep the HTML valid?
If you choose the first 200 characters for example you may miss a closing tag, and counting tags isn’t very straight forward.
I have seen a few libraries but they are massive as they deal with a multitude of things. I only need it to generate the auto excerpts.
If you want to ensure validity, you will have to count tags, I guess.
Limiting the tags accepted by strip_tags would make it less complicated to check that.
First you should check, if the character at the specified position (200) is part of a tag.
I think, the easiest way to do that would be to check if a tag opener (<) appears on the left of the position before a tag closer (>) does.
In case you are within a tag, you would have to determine if it is a closing tag. If so you should extend your limit up to the next appearance of an “>”. If not, you reduce the limit to the last appearance of tag opener.
Now your only problem is to check if you are missing closing tags at the end of your string.
Counting allowed tags (opening and closing) would give you a hint on which closing tags you will have to add at the end, and how much of them.
That leaves you with the problem of determining the order of these “correctional” tags.
With a little logic you should be able to taht as well.
have a nice one
stefan