Sometimes the good old tools still work best. In sed, I could write things like this:
sed '/^Page 5:/,/^Page 6:/p'
sed '110,/^Page 10:/+3p'
sed '/^Page 5:/,/^Page 6:/s/this/that/g'
The first applies a substitution to all lines between the ones matching /^Page 5:/ and /^Page 6:/. The second starts printing at line 110 and stops 3 lines after the one matching /^Page 10:/. The third example applies a substitution to each line in the specified range.
I don’t mind using re.search to search line by line, but for line ranges, line numbers or relative offsets, I end up having to write a whole parser. Is there a python idiom or module that can simplify this kind of operations?
I don’t want to call sed from python: I’m doing python-type things with text, and just want to be able to operate on line ranges in a straightforward way.
Edit: It’s fine if the solution works on a python list of strings. I’m not looking to process gigabytes of text. But I do need to specify several operations, not just one, and interleave them with single-line regexp substitutions. I’ve looked at iterators (in fact I would welcome a solution using iterators), but the results always got out of hand for anything more than single operation.
Here’s a simple example: A snippet of code with java-style comments, to be changed to python comments. (Don’t worry I am NOT trying to write a cross-compiler using regexps 🙂
/*
This is a multi-line comment.
It does not obligingly start lines with " * "
*/
x++; // a single-line comment
It’s trivial to write regexps that change “//” comments to “#” (and also to drop semicolons, change “++” to “+= 1”, etc.) But how do we insert “#” at the start of each line of a multi-line java comment? I can do it with a regexp on the entire file as a single string, which is a pain because the rest of the transformations are line-oriented. I’ve also been unable to (usefully) integrate iterators with line-oriented regexps. I’d appreciate suggestions.
I would try to use the regex flags
re.DOTALLorre.MULTILINE.The first treats newlines as regular characters, so if you use
.*it might count newlines inside the pattern.The second is almost the same, but you can still use linestarts (
^) and endlines ($) to match these. This can be useful to count lines.I could, for now, come up with this, which prints ONE MORE LINE after the ocurrence of “six” (a whole line is captured by the final
^.*?$, but I’m pretty sure there should be a much better way):