Here’s my situation. I want to recognize Markdown for a link (in this case just one particular style of link is fine, it’s this format: [link text](url "optional title"), and what I’m trying to do is put this Markdown text into a <pre> tag with the url appropriately wrapped in an <a> tag.
A pseudoexample:
Convert
[link text](url "optional title")
to
[link text](<a href='url'>url</a> "optional title")
So I’ve dug up the very regex used by the Markdown parser which is this:
/*
text = text.replace(/
( // wrap whole match in $1
\[
(
(?:
\[[^\]]*\] // allow brackets nested one level
|
[^\[\]] // or anything else
)*
)
\]
\( // literal paren
[ \t]*
() // no id, so leave $3 empty
<?( // href = $4
(?:
\([^)]*\) // allow one level of (correctly nested) parens (think MSDN)
|
[^()\s]
)*?
)>?
[ \t]*
( // $5
(['"]) // quote char = $6
(.*?) // Title = $7
\6 // matching quote
[ \t]* // ignore any spaces/tabs between closing quote and )
)? // title is optional
\)
)
/g, writeAnchorTag);
*/
text = text.replace(/(\[((?:\[[^\]]*\]|[^\[\]])*)\]\([ \t]*()<?((?:\([^)]*\)|[^()\s])*?)>?[ \t]*((['"])(.*?)\6[ \t]*)?\))/g, writeAnchorTag);
The breakdown in the nice comment helps a lot to see what’s going on and clearly all I need to do is replace $4 submatch with <a href='$4'>$4</a>.
But of course I can’t just do str.replace(re,"<a href='$4'>$4</a>"); because that would replace my entire Markdown link markup (including the link text and optional title) with a plain link. I want the plain link to show up in the original Markdown so that it still looks just like the original Markdown in the <pre> (but now with a clickable link in it).
So, let’s see…
Extract $4:
var group_4 = str.replace(re, "$4"); // Does anybody know a more efficient way to do this? I'm not trying to replace I just need to get the 4th group
Well here I’m stuck because I want to stick "<a href='"+group_4+"'>"+group_4+"</a>" in as a replacement for $4.
Anybody have tips for me? I’m pretty sure this can be done, and I suspect it can be done elegantly as well.
I’ve already found one potential solution (which is wrong) which is to strip out the sections of the regex which are outside of group $4. I don’t think this will be sufficient because it does not do any actual link-detection based on the link content (i.e. you could define a Markdown-link using something that is not a real link at all). So I should use the original regex so as to be sure that what I am converting into an <a> is actually part of a (Markdown inline-style) link.
I think I have a way to attack the problem using what I already know. Simply replace with the original parts. This means there must be other sub-matches that cover the entirety of the expression before and after
$4. Supposing there is a group$xthat contains the match from the beginning up to$4and another group$ythat contains the match from the end of$4to the end of the string, all I have to do isstr.replace(re,"$x<a href='$4'>$4</a>$y")and be done with it.Now to see if it is possible to modify our regex to not change its accepted language while providing me these groups.
Update: Looking at it for a bit longer it’s actually quite basic:
gets me 99% of the way there to fully replicating the original input, and the only part where this is incorrect is in the space between
$4and$5which in the input is[ \t]*so all I have to do is wrap that into a new group in the original regex. I believe it will become$5so it will be:Carats on the line below indicate where parens were added.
should yield the exact original, so
ought to do it.
Now what’s left is devising a way to only escape the HTML outside of these link constructs because I don’t want to escape the anchor tag. Hmmm.