Given an HTML string, I would like to return a modified string with the following properties:
- The first n characters of the text contents (HTML tags aside) should remain.
- Elements after n characters have been met should be removed entirely.
- If n characters is not at the end of an element, text afterwards in the same element should not remain.
- Tags on elements at and before n characters should remain.
Basically, I just want to return a shortened version of the HTML, without the DOM structure being interrupted, and based on the length of the text contents only.
Using PHP’s DOM implementation, it seems this will be overly complex. Using a pattern match isn’t ideal as the conditions of the modified string might change over time, and it would require rewriting each time.
Am I missing an easier way of doing this? Thanks in advance.
Really?
Here’s a very simple DOM implementation if you want the first 100 characters from inside the
<body>tag and its child nodes. You could further massage this to remove newline characters and superfluous space/tab characters or check the length of the$contentstring inside theforeachto break the loop and stop concatenation once you’ve reached a certain number of characters.UPDATE
As per your comment, here’s a simple way to count the characters inside HTML nodes and delete all the tags after the specified character limit is reached. Note that you can’t perform the delete operation inside the original
foreachbecause it causesDOMto reindex the nodes and you won’t get the results you expect. Instead, we store the nodes we want to delete in an array and delete them after the initial iteration.