let me quickly explain the problem. picture a dataset like this
data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"),
"log2.ratio"=c(2.4,0,-1,-2,-1))
In real my list is much longer lets say 12000 rows. What i really wanna do is to get the frequency for a specific amino acid pattern, and then plot the density vs the log2ratio. So for example the Pattern R-X-X-S should be detected in the amino acid column AND sometimes the sequence is separated by a “;” and the pattern analysis should be done for both.
I can just think about something ugly like gsub and subset function for a lots of log2 ratios but there should be an elegant solution. (maybe with the density function??)
In the end I would like to get a plot for density (y) vs log2raito (x) for a specific pattern AND for all other but this specific amino acid sequence pattern.
I have an aversion to naming dataframes “data” so instead named it “pdat”:
That’s about as specific as I can get with that tiny dataset. The grepping pattern would not register with “AARP;SNORE”:
This is the plot for the complementary rows (just as a minus sign):
It’s not a density, so reaching for hte density function will doom your efforts.