Sometimes the good old tools still work best. In sed, I could write things

Question

0

Asked: May 30, 20262026-05-30T10:07:18+00:00 2026-05-30T10:07:18+00:00

Sometimes the good old tools still work best. In sed, I could write things

0

Sometimes the good old tools still work best. In sed, I could write things like this:

sed '/^Page 5:/,/^Page 6:/p' 
sed '110,/^Page 10:/+3p'
sed '/^Page 5:/,/^Page 6:/s/this/that/g'

The first applies a substitution to all lines between the ones matching /^Page 5:/ and /^Page 6:/. The second starts printing at line 110 and stops 3 lines after the one matching /^Page 10:/. The third example applies a substitution to each line in the specified range.

I don’t mind using re.search to search line by line, but for line ranges, line numbers or relative offsets, I end up having to write a whole parser. Is there a python idiom or module that can simplify this kind of operations?

I don’t want to call sed from python: I’m doing python-type things with text, and just want to be able to operate on line ranges in a straightforward way.

Edit: It’s fine if the solution works on a python list of strings. I’m not looking to process gigabytes of text. But I do need to specify several operations, not just one, and interleave them with single-line regexp substitutions. I’ve looked at iterators (in fact I would welcome a solution using iterators), but the results always got out of hand for anything more than single operation.

Here’s a simple example: A snippet of code with java-style comments, to be changed to python comments. (Don’t worry I am NOT trying to write a cross-compiler using regexps 🙂

/* 
 This is a multi-line comment.
 It does not obligingly start lines with " * "
 */

x++;  // a single-line comment

It’s trivial to write regexps that change “//” comments to “#” (and also to drop semicolons, change “++” to “+= 1”, etc.) But how do we insert “#” at the start of each line of a multi-line java comment? I can do it with a regexp on the entire file as a single string, which is a pain because the rest of the transformations are line-oriented. I’ve also been unable to (usefully) integrate iterators with line-oriented regexps. I’d appreciate suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T10:07:19+00:00

I would try to use the regex flags re.DOTALL or re.MULTILINE.

The first treats newlines as regular characters, so if you use .* it might count newlines inside the pattern.

The second is almost the same, but you can still use linestarts (^) and endlines ($) to match these. This can be useful to count lines.

I could, for now, come up with this, which prints ONE MORE LINE after the ocurrence of “six” (a whole line is captured by the final ^.*?$, but I’m pretty sure there should be a much better way):

import re

source = """one
two
three
four
five
six
seven
eight
nine
ten"""

print re.search('^three.*six.*?^.*?$', source, re.DOTALL|re.MULTILINE).group(0)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Sometimes the good old tools still work best. In sed, I could write things

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply