I need to transform some text files into HTML code. I’m stuck in transforming a list into an HTML unordered list. Example source:
some text in the document
* item 1
* item 2
* item 3
some other text
The output should be:
some text in the document
<ul>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
</ul>
some other text
Currently, I have this:
r = re.compile(r'\*(.*)\n')
r.sub('<li>\1</li>', the_text_document)
which creates an HTML list without < ul > tags.
How can I identify the first and last items and surround them with < ul > tags?
After playing with some ideas, I’ve decided to go with a second regex.
So basically, after running the first regex (from my original post, that creates the
<li>tags), I run:This will find the first match of
<li>tag and the last match of</li>\ncombo, not followed by a<li>tag (which essentially means the entire list) and add<ul>tags.EDIT:
I modified the regex a bit so it won’t be greedy. This way it can handle multiple lists in the same document. Only requirement is that there are no spaces between list items, as @Aprillion mentioned below
EDIT 2:
Modified the negative lookahead to treat spaces between list items as well, so all cases are covered