I’m trying to figure out the syntax of both the sed command and perl

Question

0

Asked: May 25, 20262026-05-25T03:32:17+00:00 2026-05-25T03:32:17+00:00

I’m trying to figure out the syntax of both the sed command and perl

0

I’m trying to figure out the syntax of both the sed command and perl script:

sed 's/^EOR:$//' INPUTFILE |
perl -00 -ne '/
TAGA01:\s+(.*?)\n
.*
TAGCC08:\s+(.*?)\n
# and so on
/xs && print "$1 $2\n"'

Why is there a circumflex ^ in the sed command? The third slash / will replace all instances of EOR: with a blank line, correct?

I understand some of the Perl script. Looking at perlrun, -00 will slurp the stream in paragraph mode and -n starts a while <> loop.

Why is there the first slash / next to the apostrophe? The command searches for TAGXXXX:, but I am not sure what \s+(.*?) does. Does that put whatever is after the tag into a variable? How about the .* in the between tag searches? What does /ns do? What do the $1 and $2 refer to in the print line?

This was tough to find online, and if someone could kick me in the right direction, I’d appreciate it.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T03:32:18+00:00

The circumflex ^ is regex for “start of line”, and $ is regex for “end of line”; so sed will only remove lines which contain exactly “EOR:” and nothing else.

The Perl script is basically perl -00 -ne '/(re)g(ex)/ && print "re ex\n"' with a big ole regex instead of the simple placeholder I put here. In particular, the /x modifier allows you to split the regex over several lines. So the first / is the start of the regex and the final / is the end of the regex and the lines in between form the regex together.

The /s modifier changes how Perl interprets . in a regex; normally it will match any character except newline, but with this option, it includes newlines as well. This means that .* can match multiple lines.

\s matches a single whitespace character; \s+ matches as many whitespace characters as possible, but there has to be at least one.

(.*?) matches an arbitrary length of string; the dot matches any character, the asterisk says zero or more of any character, and the question mark modifies the asterisk repetition operator to match as short a string as possible instead of as long a string as possible. The parentheses cause the skipped expression to be captured in a back reference; the backrefs are named $1, $2, etc, as many as there are backreferences; the numbers correspond to the order of the opening parenthesis (so if you apply (a(b)) to the string “ab”, $1 will be “ab” and $2 will be “b”).

Finally, \n matches a literal newline. So the (.*?) non-greedy match will match up to the first newline, i.e. the tail of the line on which the TAGsomething was found. (I
imagine these are gene sequences, not “tags”?)

It doesn’t really make sense to run sed separately; Perl would be quite capable of removing the EOR: lines before attempting to match the regex.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to figure out the syntax of both the sed command and perl

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply