I’m writing a script to go through a product database with poorly, inconsistently formatted product descriptions to make its HTML uniform. One problem I’m having is capturing and replacing lines of code formatted the same way. For example, I’d like to replace all their
• item 1
• item 2
• item 3
with
<ul>
<li>item 1</li>
<li>item 3</li>
<li>item 2</li>
</ul>
Replacing each • line with a <li>content</li> line is easy enough, but I can’t for the life of me figure out the regex to get before and after the list. My though is to capture everything starting with • until there is a newline that does not start with •. Here’s my latest try (python):
In : p = re.compile(
r'•.*(?!^•)'
)
In : p.findall(text, re.MULTILINE, re.DOTALL)
Out : []
In : p.findall(text, re.MULTILINE)
Out : ['• item 1', '• item 2', '• item 3']
In : p.findall(text, re.DOTALL)
Out : ['• item 1', '• item 2', '• item 3']
In : p.findall(text)
Out : ['• item 1', '• item 2', '• item 3']
Any ideas on how to capture something like ['• item 1\n• item 2\n• item 3']?
Here’s a non-regex based solution:
Test run: