I am not sure if I can do this purely with sed: I am

Question

0

Editorial Team

Asked: May 23, 20262026-05-23T04:57:21+00:00 2026-05-23T04:57:21+00:00

I am not sure if I can do this purely with sed: I am

0

I am not sure if I can do this purely with sed:

I am trying to rearrange lines like this

GF:001,GF:00012,GF:01223<TAB>XXR
GF:001,GF:00012,GF:01223,GF:0666<TAB>XXXR3

to

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

Anyone any hints? The cardinality of GF:XXXX is alternating as the length of GF:XXXX is.

I am stuck with sed -n ' '/$XX.*$$/' { s/,/\t\1\n/ }' input but I cannot reference to the originally matched pattern in the first place. any ideas? cheers!

Update:
I think it is not possible to do this with just using sed. So I used perl to do this:

perl -e 'open(IN, "< file");
while (<IN>) {
    @a = split(/\t/);
    @gos = split(/,/, $a[0]);
    foreach (@gos) {
      print $_."\t".$a[1];
    }
close( IN );' > output

But if anyone knows a way to solve this just with sed please post it here…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T04:57:22+00:00

It can be done in sed, though I probably would use Perl (or Awk or Python) to do it.

I claim no elegance for this solution, but brute force and ignorance sometimes pays off. I created a file called, unoriginally, sed.script containing:

/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/{
:redo
s/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/\1<TAB>\3@@@@@\2<TAB>\3/
h
s/@@@@@.*//
p
x
s/.*@@@@@//
t redo
d
}

I ran it as:

sed -f sed.script input

where input contained the two lines shown in the question. It produced the output:

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

(I took the liberty of deliberately misinterpreting <TAB> to be a 5-character sequence instead of a single tab character; you can easily fix the answer to handle an actual tab character instead.)

Explanation of the sed script:

Find lines with more than one occurrence of GF:nnn separated by commas (we do not need to process lines that contain a single such occurrence). Do the rest of the script only on such lines. Anything else is passed through (printed) unchanged.
Create a label so we can branch back to it
Split the line into 3 remembered parts. The first part is the initial GF information; the second part is any other GF information; the third part is the field after the <TAB>. Replace this with the first field, <TAB>, third field, implausible marker pattern (@@@@@), second field, <TAB>, third field.
Copy the modified line to the hold space.
Delete the marker pattern to the end.
Print.
Swap the hold space into the pattern space.
Remove everything up to and including the marker pattern.
If we’ve done any work, go back to the redo label.
Delete what’s left (it was printed already).
End of script block.

This is a simple loop that reduces the number of the patterns by one on each iteration.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am not sure if I can do this purely with sed: I am

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply