I’m writing a python snippet to fix the casing of titles in HTML code. So far, I’ve come up with this code:
pattern = re.compile("<h1>(.*)</h1>|<h2>(.*)</h2>|<h3>(.*)</h3>|<h4>(.*)</h4>|<h5>(.*)</h5>|<h6>(.*)</h6>")
def replace(m):
contents = m.group(1)
replacement = contents[0] + contents[1:].lower()
return replacement
Then, given a line, the transformation I use is line = pattern.sub(replace, line).
This doesn’t work, because m.group(1) is always None, whereas I’d like it to be the match corresponding to any of the clauses in my regex. Since patterns can’t share a name in python, I’m somewhat at a loss.
An obvious solution is to group all the patterns which I used, but then <h1>bla</h2> would be recognized. That’s not good, since <h1><a href="...">Bla</a></h1> <h2>Bla</h2> should yield two matches (<a href="...">Bla</a>, and <a href="...">Bla</a>)
Ideas?
From what I understand you just want to capitalize all of the headings. You can use
lxmlwhich would make this fairly painless: