I’m trying to make a markdown parser in python, not because it’s useful but because it’s fun and because I’m trying to learn regular expression.
#! /usr/bin/env python
#-*- coding: utf-8 -*-
import re
class Converter:
def markdown2html(self, string):
string = re.sub('\*{3}(.+)\*{3}', '<strong>\\1</strong>', string)
string = re.sub('\*{2}(.+)\*{2}', '<i>\\1</i>', string)
string = re.sub('^#{1}(.+)$', '<h1>\\1</h1>', string, flags=re.MULTILINE)
string = re.sub('^#{2}(.+)$', '<h2>\\1</h2>', string, flags=re.MULTILINE)
return string
markdown_sting = """
##h2 heading
#H1 heading
This should be a ***bold*** char
#anohter h1
anohter ***bold***
this is a **italic** string
"""
converter = Converter()
print converter.markdown2html(markdown_sting)
It prints
<h1>#h2 heading</h1>
<h1>H1 heading</h1>
This should be a <strong>bold</strong> char
<h1>anohter h1</h1>
anohter <strong>bold</strong>
this is a <i>italic</i> string
As you can see it does not parse the h2 tag. Where I went wrong?
You could make sure to only match the wanted number of hash signs by making sure that the first character of the heading text isn’t a hash sign. This can be done by using
[^#]like this:This way the order of the rules won’t matter, making the rules more robust.