I need to extarct all links from a database so I can create a URL rewrite. Using a SQL statement, how can I extract just links from a table? Sometimes there is more than one link within a cell which further complicates it. Any idea on how to achieve this?
EDIT
An example of this would be: SELECT myval FROM htmlcontrols.
“myval” has an HTML string such as “<div>Hi this is a test. <a href="somewhere.htm">Click here</a> or <a href="http://somewhereelse.com/testarea">here</a></div>“. I want an extract like this:
LINKS
-----
somewhere.htm
http://somewhereelse.com/testarea
You are probably better off either handling this on the client side, iterating through the fields and parsing the HTML to then re-insert them on whatever table/columns you need to; or at least, creating a UDF that can do the parsing efficiently.
Note that the link I posted above is an implementation of a UDF RegEx function but I am not suggesting necessarily that you should use a RegEx to parse HTML as this is almost always a bad idea.
If you go with the CLR function, take a look at HTMLAgilityPack