Considering the following data frames : > tail(tot.final) names.id sequence names.reads width.reads names.counts st

Question

0

Asked: May 29, 20262026-05-29T06:56:44+00:00 2026-05-29T06:56:44+00:00

Considering the following data frames : > tail(tot.final) names.id sequence names.reads width.reads names.counts st

0

Considering the following data frames :

 > tail(tot.final)
   names.id                  sequence names.reads width.reads names.counts st end flag
819   125546  TAGCTTATATGACTGATGTTGACA    125546-4          24            4  8  31 TRUE
820   218783  TCGCTTATCAGACTGATGTTGAAA    218783-2          24            2  8  31 TRUE
821   272992  CAGCTTATCAGACTGATGTTGAAA    272992-2          24            2  8  31 TRUE
822   135191 TAGCTTATCAGACTGATGTTGAACA    135191-4          25            4  8  32 TRUE
823   278047 TAGCTTATCAGACTGATGTTGAAGA    278047-2          25            2  8  32 TRUE
824   317980 TAGCTTATCAGACTGATGTTGCCCT    317980-2          25            2  8  32 TRUE

head(plusa)
  names.id            sequence names.reads width.reads names.counts st end flag
2     28092   ATCAGACTGATGTTGAC    28092-29          17           29 14  30 TRUE
4     65308  TTATCAGACTGATGTTGA    65308-10          18           10 12  29 TRUE
6     71226  TATCAGACTGATGTTGAC     71226-9          18            9 13  30 TRUE

> nrow(tot.final)
[1] 824
> nrow(plusa)
[1] 421

plusa contains 451 rows with a  common plusa$sequence column. (not sorted)

I would like to update tot.final$names.counts elements by adding the plusa$names.counts values of the corresponding plusa$sequence.

Is there a possiblity to merge them in this manner considering the “sequence” field as id?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T06:56:45+00:00

As far as I can understand, I think this is what you want to do:

join (say rbind) plusa to tot.final
for each unique sequence in this data frame:
sum up the counts column.

In that case, you can use the plyr library. I made up an example to illustrate, you should e able to adapt it to yours:

library(plyr)
df.final <- data.frame(sequence=c('A','B','C','D'),
                       counts=c(100,123,234,200),
                       stringsAsFactors=F)
#   sequence counts
# 1        A    100
# 2        B    123
# 3        C    234
# 4        D    200

df.plusa <- data.frame(sequence=c('A','E','C','F'),
                       counts=c(10,20,30,40),
                       stringsAsFactors=F)
#   sequence counts
# 1        A     10
# 2        E     20
# 3        C     30
# 4        F     40

# rbind together and do the counts:
df.final.aggregated <- ddply(rbind(df.final,df.plusa),
                             .(sequence),
                             summarise,
                             counts=sum(counts))
#   sequence counts
# 1        A    110
# 2        B    123
# 3        C    264
# 4        D    200
# 5        E     20
# 6        F     40

Note that ddply(dataframe,.(sequence),FUNCTION) means:

for each unique seq in dataframe$sequence:
    do FUNCTION( dataframe[ dataframe$sequence==seq, ] )
    merge them all back into one big dataframe.

For your particular data this could work (haven’t tested as I don’t have your data):

ddply( rbind(tot.final,plusa), .(sequence), summarise,
       names.counts = sum(names.counts) )

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Considering the following data frames : > tail(tot.final) names.id sequence names.reads width.reads names.counts st

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply