If I have a string of HTML, maybe like this…
<h2>Header</h2><p>all the <span class="bright">content</span> here</p>
And I want to manipulate the string so that all words are reversed for example…
<h2>redaeH</h2><p>lla eht <span class="bright">tnetnoc</span> ereh</p>
I know how to extract the string from the HTML and manipulate it by passing to a function and getting a modified result, but how would I do so whilst retaining the HTML?
I would prefer a non-language specific solution, but it would be useful to know php/javascript if it must be language specific.
Edit
I also want to be able to manipulate text that spans several DOM elements…
Quick<em>Draw</em>McGraw
warGcM<em>warD</em>kciuQ
Another Edit
Currently, I am thinking to somehow replace all HTML nodes with a unique token, whilst storing the originals in an array, then doing a manipulation which ignores the token, and then replacing the tokens with the values from the array.
This approach seems overly complicated, and I am not sure how to replace all the HTML without using REGEX which I have learned you can go to the stack overflow prison island for.
Yet Another Edit
I want to clarify an issue here. I want the text manipulation to happen over x number of DOM elements – so for example, if my formula randomly moves letters in the middle of a word, leaving the start and end the same, I want to be able to do this…
<em>going</em><i>home</i>
Converts to
<em>goonh</em><i>gmie</i>
So the HTML elements remain untouched, but the string content inside is manipulated (as a whole – so goinghome is passed to the manipulation formula in this example) in any way chosen by the manipulation formula.
I implemented a version that seems to work quite well – although I still use (rather general and shoddy) regex to extract the html tags from the text. Here it is now in commented javascript:
Method
The function defined above accepts a string of HTML, and a manipulation function to act on words within the string regardless of if they are split by HTML elements or not.
It works by first removing all HTML tags, and storing the tag along with the index it was taken from, then manipulating the text, then adding the tags into their original position in reverse order.
Test
There are still a few quirks, like the heading and paragraph text not being recognized as separate words (because they are in separate block level tags rather than inline tags) but this is basically a proof of method of what I was trying to do.
I would also like it to be able to handle the string manipulation formula actually adding and removing text, rather than replacing/moving it (so variable string length after manipulation) but that opens up a whole new can of works I am not yet ready for.
Now I have added some comments to the code, and put it up as a gist in javascript, I hope that someone will improve it – especially if someone could remove the regex part and replace with something better!
Gist: https://gist.github.com/3309906
Demo: http://jsfiddle.net/gh/gist/underscore/1/3309906/
(outputs to console)
And now finally using an HTML parser
(http://ejohn.org/files/htmlparser.js)
Demo: http://jsfiddle.net/EDJyU/