I have got a huge 1000 x 100000 dataframe like following to recode to

Question

0

Asked: June 11, 20262026-06-11T13:45:10+00:00 2026-06-11T13:45:10+00:00

I have got a huge 1000 x 100000 dataframe like following to recode to

0

I have got a huge 1000 x 100000 dataframe like following to recode to numberic values.

myd <- data.frame (v1 = sample (c("AA", "AB", "BB", NA), 10, replace = T),
                   v2 = sample (c("CC", "CG", "GG", NA), 10, replace = T),
                   v3 = sample (c("AA", "AT", "TT", NA) , 10, replace = T),
                   v4 = sample (c("AA", "AT", "TT", NA) , 10, replace = T),
                   v5 = sample (c("CC", "CA", "AA", NA) , 10, replace = T)
                   )
myd
     v1   v2   v3   v4   v5
1    AB   CC <NA> <NA>   AA
2    AB   CG   TT   TT   AA
3    AA   GG   AT   AT   CA
4  <NA> <NA> <NA>   AT <NA>
5    AA <NA>   AA <NA>   CA
6    BB <NA>   TT   TT   CC
7    AA   GG   AA   AT   CA
8  <NA>   GG <NA>   AT   CA
9    AA <NA>   AT <NA>   CC
10   AA   GG   TT   AA   CC

Each variables have potentially four unique values.

unique(myd$v1)

[1] AB   AA   <NA> BB  
Levels: AA AB BB

unique(myd$v2)

[1] CC   CG   GG   <NA>
  Levels: CC CG GG

Such unique values can be any combination however consists of two alphabets (-except NA). For example “A”, “B” in first case will make combination “AA”, “AB”, “BB”. The numberical code for these would be 1, 0, -1 respectively. Similarly for second case alphabets “C”, “G” makes “CC”, “CG”, “GG”, thus the numberical codes would be 1, 0, -1 respectively. Thus the above myd need to be recoded to:

 myd
         v1   v2   v3    v4      v5
    1    0   1     <NA>  <NA>    1
    2    0   0     -1    -1      1
    3    1   -1     0    0       0
    4  <NA>  <NA>  <NA>   0     <NA>
    5    1  <NA>    1  < NA>      0
    6   -1  <NA>    -1    -1      -1
    7    1   -1    1      0        0
    8  <NA>   -1   <NA>   0        0
    9    1  <NA>    0    <NA>     -1
    10   1   -1    -1     1       -1

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T13:45:11+00:00

You can take advantage of the fact that your data are factors, which have numeric indices underneath them.

For example:

> as.numeric(myd$v1)
 [1]  2  2  1 NA  1  3  1 NA  1  1

The numeric values correspond to the levels() of the factor:

> levels(myd$v1)
[1] "AA" "AB" "BB"

So 1 == AA, 2 == AB, 3 == BB…and so on.

So you can simply convert your data to numeric, and apply the necessary maths to get your data scaled how you want it. So we can subtract by 2, and then multiply by -1 to get your results:

(sapply(myd, as.numeric) - 2) * -1
#-----
      v1 v2 v3 v4 v5
 [1,]  0  1 NA NA  1
 [2,]  0  0 -1 -1  1
 [3,]  1 -1  0  0  0
 [4,] NA NA NA  0 NA
 [5,]  1 NA  1 NA  0
 [6,] -1 NA -1 -1 -1
 [7,]  1 -1  1  0  0
 [8,] NA -1 NA  0  0
 [9,]  1 NA  0 NA -1
[10,]  1 -1 -1  1 -1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have got a huge 1000 x 100000 dataframe like following to recode to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply