Considering the following data frames :
> tail(tot.final)
names.id sequence names.reads width.reads names.counts st end flag
819 125546 TAGCTTATATGACTGATGTTGACA 125546-4 24 4 8 31 TRUE
820 218783 TCGCTTATCAGACTGATGTTGAAA 218783-2 24 2 8 31 TRUE
821 272992 CAGCTTATCAGACTGATGTTGAAA 272992-2 24 2 8 31 TRUE
822 135191 TAGCTTATCAGACTGATGTTGAACA 135191-4 25 4 8 32 TRUE
823 278047 TAGCTTATCAGACTGATGTTGAAGA 278047-2 25 2 8 32 TRUE
824 317980 TAGCTTATCAGACTGATGTTGCCCT 317980-2 25 2 8 32 TRUE
head(plusa)
names.id sequence names.reads width.reads names.counts st end flag
2 28092 ATCAGACTGATGTTGAC 28092-29 17 29 14 30 TRUE
4 65308 TTATCAGACTGATGTTGA 65308-10 18 10 12 29 TRUE
6 71226 TATCAGACTGATGTTGAC 71226-9 18 9 13 30 TRUE
> nrow(tot.final)
[1] 824
> nrow(plusa)
[1] 421
plusa contains 451 rows with a common plusa$sequence column. (not sorted)
I would like to update tot.final$names.counts elements by adding the plusa$names.counts values of the corresponding plusa$sequence.
Is there a possiblity to merge them in this manner considering the “sequence” field as id?
As far as I can understand, I think this is what you want to do:
plusatotot.finalsum up the counts column.
In that case, you can use the
plyrlibrary. I made up an example to illustrate, you should e able to adapt it to yours:Note that
ddply(dataframe,.(sequence),FUNCTION)means:For your particular data this could work (haven’t tested as I don’t have your data):