I am not sure if I can do this purely with sed:
I am trying to rearrange lines like this
GF:001,GF:00012,GF:01223<TAB>XXR
GF:001,GF:00012,GF:01223,GF:0666<TAB>XXXR3
to
GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3
Anyone any hints? The cardinality of GF:XXXX is alternating as the length of GF:XXXX is.
I am stuck with sed -n ' but I cannot reference to the originally matched pattern in the first place. any ideas? cheers!
'/\(XX.*\)$/' {
s/,/\t\1\n/
}' input
Update:
I think it is not possible to do this with just using sed. So I used perl to do this:
perl -e 'open(IN, "< file");
while (<IN>) {
@a = split(/\t/);
@gos = split(/,/, $a[0]);
foreach (@gos) {
print $_."\t".$a[1];
}
close( IN );' > output
But if anyone knows a way to solve this just with sed please post it here…
It can be done in
sed, though I probably would use Perl (or Awk or Python) to do it.I claim no elegance for this solution, but brute force and ignorance sometimes pays off. I created a file called, unoriginally,
sed.scriptcontaining:I ran it as:
where
inputcontained the two lines shown in the question. It produced the output:(I took the liberty of deliberately misinterpreting
<TAB>to be a 5-character sequence instead of a single tab character; you can easily fix the answer to handle an actual tab character instead.)Explanation of the
sedscript:GF:nnnseparated by commas (we do not need to process lines that contain a single such occurrence). Do the rest of the script only on such lines. Anything else is passed through (printed) unchanged.<TAB>. Replace this with the first field,<TAB>, third field, implausible marker pattern (@@@@@), second field,<TAB>, third field.redolabel.This is a simple loop that reduces the number of the patterns by one on each iteration.