The Issue
I’m migrating wiki pages from the FlexWiki engine to the FOSwiki engine using Python regular expressions to handle the differences between the two engines’ markup languages.
The FlexWiki markup and the FOSwiki markup, for reference.
Most of the conversion works very well, except when I try to convert the renamed links.
Both wikis support renamed links in their markup.
For example, Flexwiki uses:
"Link To Wikipedia":[http://www.wikipedia.org/]
FOSwiki uses:
[[http://www.wikipedia.org/][Link To Wikipedia]]
Both of which produce a rewritten hyperlink.
I’m using the regular expression
renameLink = re.compile ("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
to parse out the link elements from the FlexWiki markup, which after running through something like
"Link Text":[LinkTarget]
is reliably producing groups
<linkText> = Link Text
<linkTarget = LinkTarget
My issue occurs when I try to use re.sub to insert the parsed content into the FOSwiki markup.
My experience with regular expressions isn’t anything to write home about, but I’m under the impression that, given the groups
<linkText> = Link text
<linkTarget = LinkTarget
a line like
line = renameLink.sub ( "[[\g<linkTarget>][\g<linkText>]]" , line )
should produce
[[LinkTarget][Link Text]]
However, in the output to the text files I’m getting
[[LinkTarget [[Link Text]]
which breaks the renamed links.
After a little bit of fiddling I managed a workaround, where
line = renameLink.sub ( "[[\g<linkTarget>][ [\g<linkText>]]" , line )
produces
[[LinkTarget][ [[Link Text]]
which, when displayed in FOSwiki looks like
[[Link Text
Which WORKS, but isn’t very pretty.
There are probably thousands of instances of these renamed links in the pages I’m trying to convert, so fixing it by hand isn’t any good.
For the record I’ve run the script under Python 2.5.4 and Python 2.7.3, and gotten the same results.
Am I missing something really obvious with the syntax? Or is there an easy workaround?
Solution
There wasn’t anything wrong with the original expression.
I started running through the other regex’s in my script and commented out lines I thought might be overlapping with the renamed-link expression. That appears to have done the trick, and as a semi-permanent fix I’ve separated the link-focused expressions and the other expressions into separate scripts, which I run one after another.
I guess them moral here is to double-check that you don’t have overlapping expressions.
Attempted Solutions (Just see Solution above)
String addition
line = renameLink.sub ( "[[\g<linkTarget>]" + "[\g<linkText>]]" , line )
produces
[[linkTarget [[Link Text]]
It doesn’t matter how you slice the concatenation, the result is the same.
Escaping the square brackets, e.g.
line = renameLink.sub ( "\[\[\g<linkTarget>\]\[\g<linkName>\]\]" , line )
produces
\[ [[LinkTarget\]] [Link Text\]\]
And it does. Example:
Output:
You probably have issues elsewhere than your expression.