I am trying to modify a file which is set up like this:
chr start ref alt
chr1 18884 C CAAAA
chr1 135419 TATACA T
chr1 332045 T TTG
chr1 453838 T TAC
chr1 567652 T TG
chr1 602541 TTTA T
chr1 614937 C CTCTCTG
chr1 654889 C CA
chr1 736800 AC A
I want to modify it such that:
if column “ref” is a string >1 (i.e line 2) then I generate 2 new columns where:
first new column = start coordinate-1
second new column = start coordinate+(length of string in ref)+1
therefore, for line 2 output would look like:
chr1 135419 TATACA T 135418 135426
or:
if length of string in “ref” = 1 and column “alt”=string of length>1 (i.e. line 1) then
first new column = start coordinate
second new column = start coordinate+2
so, output for line 1 would be:
chr1 18884 C CAAAA 18884 18886
I have tried to this in awk but without success
My perl is non-existent but would this be the best way? Or maybe in R?
Perl solution. Note that your specification does not mention what to do if both strings are length 1.