The Maven Build Number Plugin has support for Mercurial since…

Question

0

Asked: May 11, 20262026-05-11T07:14:08+00:00 2026-05-11T07:14:08+00:00

I’m having a bit of trouble getting a Python regex to work when matching

0

I’m having a bit of trouble getting a Python regex to work when matching against text that spans multiple lines. The example text is (\n is a newline)

some Varying TEXT\n \n DSJFKDAFJKDAFJDSAKFJADSFLKDLAFKDSAF\n [more of the above, ending with a newline]\n [yep, there is a variable number of lines here]\n \n (repeat the above a few hundred times).

I’d like to capture two things:

the some Varying TEXT part
all lines of uppercase text that come two lines below it in one capture (I can strip out the newline characters later).

I’ve tried a few approaches:

re.compile(r"^>(\w+)$$([.$]+)^$", re.MULTILINE) # try to capture both parts re.compile(r"(^[^>][\w\s]+)$", re.MULTILINE|re.DOTALL) # just textlines

…and a lot of variations hereof with no luck. The last one seems to match the lines of text one by one, which is not what I really want. I can catch the first part, no problem, but I can’t seem to catch the 4-5 lines of uppercase text. I’d like match.group(1) to be some Varying Text and group(2) to be line1+line2+line3+etc until the empty line is encountered.

If anyone’s curious, it’s supposed to be a sequence of amino acids that make up a protein.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T07:14:08+00:00

Try this:

re.compile(r"^(.+)\n((?:\n.+)+)", re.MULTILINE)

I think your biggest problem is that you’re expecting the ^ and $ anchors to match linefeeds, but they don’t. In multiline mode, ^ matches the position immediately following a newline and $ matches the position immediately preceding a newline.

Be aware, too, that a newline can consist of a linefeed (\n), a carriage-return (\r), or a carriage-return+linefeed (\r\n). If you aren’t certain that your target text uses only linefeeds, you should use this more inclusive version of the regex:

re.compile(r"^(.+)(?:\n|\r\n?)((?:(?:\n|\r\n?).+)+)", re.MULTILINE)

BTW, you don’t want to use the DOTALL modifier here; you’re relying on the fact that the dot matches everything except newlines.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions