NOTE : My problem is NOT that my links are not being replaced. But, it’s being NESTED.
eg, this is the comment
some string with www.google.com/blah/blah also something else www.google.com
by the time second string replace is done, part of first one is also valid (http://www.google.com/blah/blah) so it’s replacing that link twice.
I have a web app which lets users comment.
I am processing the input string and converting all links to html link format when I display it on the page. Original user input string stays in DB and nothing ever happens so it’s not corrupted over processing. Just when I show that on page, I do my function on it.
Now, this is the logic I am using to replace all links with their html formats
- Regex all links
- For each match, replace link with it’s html format version in the original string.
- Finally display string.
ex: www.google.com becomes <a href="http://www.google.com" target="_blank">www.google.com</a> just before it’s displayed on page.
This was working great until recently, one of my customer posted a content with two links from same domain.
the links were, say,
My problem is, when the second time around, a string replace is done (I am using StringBuilder.Replace) the first link gets replaced as well!
so, firstly,
www.google.com/images/blahblah
becomes
<a href="http://www.google.com/images/blahblah" target="_blank">www.google.com/image/blahblah</a>
which is well. But the problem arises for second string replace, since replace is global, it does a replace on already processed link so the original (above) link becomes twisted as it sees http://www.google.com in there as well.
This is messing up so much that I actually get a mutilated abomination of a string.
How do I avoid this?
Does the Regex.Matches provide an index of matched element for me to work with? I couldn’t find it anywhere.
What’s the best way to deal with? any suggestions?
sorry for lengthy question.
I can prolly do this by manually traversing string but it’s long and painful there’s got to be a good way to do it…
edit adding extra info as someone asked:
My regex:
string rPattern = @"(((http|ftp|https):\/\/)|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#!]*[\w\-\@?^=%&/~\+#])?";
Regex rLinks = new Regex(rPattern, RegexOptions.IgnoreCase);
MatchCollection matches = rLinks.Matches(inputString);
then I am using
foreach(Match match in matches)
{
if(match.value.StartsWith("www.youtube.com/watch"))
{
//logic to embed youtube video - this works fine.
}
}
//Here I replace all hyperlinks to their <a href> parts
Regex.Matchesreturns aMatchCollection.Match.IndexIs what you’re looking for.But really, you’re probably looking for something more like this:
Or, you can use a matchEvaluator to do more advanced work (like ensure we don’t add a double http://.