I’ve got a function that helps to interlink pages within my site by scanning blog entries, news, and other items for certain core keywords. It then replaces those keywords with a link to the corresponding page.
I’m running into a problem where some words that should not be replaced with links are. For example, I have a summary tag in a few of my HTML tables that contains a small summary of the table content. So for example, I might have a tag that looks like this:
<table width="500" cellspacing="0" cellpadding="4" border="0" summary="This table contains a list of all car parts in inventory along with their corresponding prices">
...
</table>
My function incorrectly replaces a keyword or phrase like “car parts” with a link. How can I structure my replacement regular expression to NOT replace it in cases like this, but DO replace it should it appear within a paragraph or even within a cell in an HTML table.
Thanks in advance for any help and guidance!
EDIT: Just to clarify, I’m using PHP to render my pages. I’m using a str_replace() before the content is output as HTML to the page. I want to be able to replace that with an ereg_replace() so that I replace the content only if it meets certain conditions (i.e. as explained above). Sorry if this caused any confusion!
Don’t use regexes to parse HTML. Use the PHP DOM: