I have problems while trying to find some strings in large input with new line characters using Python 2.7.3 regular expressions. I have something like this:
type="thing" blahblahblah
something id="123456"
...
type="disabled thing" blahblahblah
somethingelse id="123457"
...
I want to get all the ids where type=”thing”. Because of the greedy regex engine, I have to write a regex like:
r'type="thing"(?!type).+id="[0-9]{6,7}"', re.S
However, this doesn’t work. How do I make an exclude-string regex with this kind of data on input?
If I understand your question (before it was edited) correctly, you want both lines that are associated with an id. In that case, you will need something along these lines (assuming new lines are marked by ‘\n’):
If you don’t use re.S, you can more efficiently control the greediness of your expression. Your
.+combined withre.Swill make your expression greedy, which you will otherwise have to account for. You could also use something like.+?. The question mark after the plus sign would make your expression not greedy, but I would opt for a more concise expression.