Simple one here but I’m fairly new to Python.
I have a string like this:
this is page one of an article
<!--pagebreak page two --> this is page two
<!--pagebreak--> this is the third page
<!--pagebreak page four --> last page
// newlines added for readability
I need to split the string using this regex: <!--pagebreak(*.?)--> – the idea is that sometimes the <!--pagebreak--> comments have a ‘title’ (which I use in my templates), other times they don’t.
I tried this:
re.split("<!--pagebreak*.?-->", str)
which returned only the items with ‘titles’ in the pagebreak (and didn’t split them correctly either). What am I doing wrong here?
Change
*.?into.*?:Your current regex accepts any number of literal
k‘s, optionally followed by (any character).Also, I would recommend using raw strings (
r"...") for your regular expressions. It’s not necessary in this case, but it’s an easy way to spare yourself a few headaches.