Can GNU sed be used to ID a pattern based on rows? Or in

Question

0

Asked: June 18, 20262026-06-18T02:28:25+00:00 2026-06-18T02:28:25+00:00

Can GNU sed be used to ID a pattern based on rows? Or in

0

Can GNU sed be used to ID a pattern based on rows? Or in other words, how can you insert a line break in the pattern you’re using sed to ID?

For example, in the following dataset (which is much larger in actuality), I have an error that should have been removed when I searched for duplicates, but was not because the information is slightly different in two rows (which is irrelevant at this point).

In this case, I want to remove the error entirely from the original file.In other words, if, within my file, two rows of rs#### follow each other, I would like to erase these two copies, and also the six lines that follow them. It would be nice to relocate them to a new file, but what is most critical is that they are removed from the original.

rs1038864   16  73762557    A   G
1   1633    0.5835  -0.0004 0.0035
1   1643    0.8902  0.004436    0.004354
0   0   0   0   0
rs1019567   16  83343715    G   T
rs1019567   16  83343715    G   T
1   1641    0.4692  0.0009  0.0035
1   559 0.4612  -0.0025 0.0060
1   1643    0.5178  -0.002244   0.002745
1   1643    0.5178  -0.002244   0.002745
1   1909    0.493842692 0.0008  0.0027
1   1950    0.493842692 0.0008  0.0027
rs1038556   16  55132072    C   T
1   6388    0.7773  0.0020  0.0044
1   6843    0.1161  0.001379    0.004275
1   1509    0.978660942 0.0041  0.0096
rs1019797   16  87788686    C   G
rs1019797   16  87788686    C   G
1   1639    0.717   0.0022  0.0038
1   5557    0.7193  0.0020  0.0064
1   1643    0.6691  -0.001044   0.002888
1   6843    0.6691  -0.001044   0.002888
1   1959    0.315280799 -0.0041 0.0032
1   1909    0.315280799 -0.0041 0.0032
rs1038887   16  62660698    A   G
1   1688    0.4947  -0.0028 0.0035
0   0   0   0   0
1   1909    0.464393658 0.0007  0.0028

Something like,

sed -i '/^rs.*d
^rs.*/,+6d' test.data

or perhaps

sed -i '/^rs.*;^rs.*/,+6d' test.data

?
Any thoughts would be appreciated!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T02:28:26+00:00

If infile contains the listed input, something like this should do (GNU sed):

<infile sed -r 'N; /([^\n]+)\n\1/ { N; N; N; N; N; N; d }; P; D'

If you want to save the deleted bits to deleted.txt use this:

<infile sed -r 'N; /([^\n]+)\n\1/ { N; N; N; N; N; N; w deleted.txt
d }; P; D'

Note that the w command needs to be terminated by a newline.

Explanation

This loads a second line into the pattern space (N) and checks if the lines are duplicates (/([^\n]+)\n\1/), if the are six more lines are loaded into pattern space and deleted (d).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can GNU sed be used to ID a pattern based on rows? Or in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply