Input: rs001 A C T G C G T T rs002 C C T

Question

0

Asked: June 13, 20262026-06-13T19:53:36+00:00 2026-06-13T19:53:36+00:00

Input: rs001 A C T G C G T T rs002 C C T

0

Input:

rs001 A C T G C G T T
rs002 C C T T G G A A

out1:

rs001 AC TG CG TT
rs002 CC TT GG AA

out2 :

rs001 1 1 1 2
rs002 2 2 2 2

Ok so basically I want to replace any two similar nucleotides (like AA, CC, TT, or GG) to 2 and any two different (like AT, TA, CG, .. etc) to 1 taking into account that the input should be converted first to out1 then to out2. Also we have so many fields (like 200 columns) in each row, so loops are needed here.

This is what I tried:

cat input | awk '{ for (x = 2; x <= NF; x = x+2) print $x$(x+1) }'

Results are so weird, so can anyone please tell me why I can’t get out1 ?! What mistakes I did in awk loops ?

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T19:53:37+00:00

Here’s how you fix your awk script to get output 1:

awk '{ printf "%s ", $1; for (x = 2; x <= NF; x = x + 2) {printf "%s%s ", $x, $(x+1)} printf "\n"}' input

print adds a new line at the end by default, so you’ll have to use formatted strings printf to specify where exactly you want the new lines.

(Also added printf "%s ", $1; at the start to print the header at the start of each line)

Edit: Triplee’s solution looks much more elegant than mine – you should ditch awk and go with his =)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Input: rs001 A C T G C G T T rs002 C C T

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply