I have a html source as a String variable.
And a word as another variable that will be highlighted in that html source.
I need a Regular Expression which does not highlights tags, but obly text within the tags.
For example I have a html source like
<cfset html = "<span>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"[^(<#wordToReplace#\b[^>]*>)]","replaced","ALL")>
and what I want to get is
<span>Text goes here, forr example it container also **replaced** </span>
But I have an error. Any tip!
You wont find one. Not one that is fully reliable against all legal/wild HTML.
The simple reason is that Regular Expressions match Regular languages, and HTML is not even remotely a Regular language.
Even if you’re very careful, you run the risk of replacing stuff you didn’t want to, and not replacing stuff you did want to, simply due to how complicated HTML syntax can be.
The correct way to parse HTML is using a purpose-built HTML DOM parser.
Annoyingly CF doesn’t have one built in, though if your HTML is XHTML, then you can use XmlParse and XmlSearch to allow you to do an xpath search for only text (not tags) that match your text… something like
//*[contains(text(), 'span')]should do (more details here).If you’ve not got XHTML then you’ll need to look at using a HTML DOM parser for Java – Google turns up plenty, (I’ve not tried any yet so can’t give any specific recommendations).