I am trying to plot the CDF curve for a large dataset containing about 29 million values using ggplot. The way I am computing this is like this:
mycounts = ddply(idata.frame(newdata), .(Type), transform, ecd = ecdf(Value)(Value))
plot = ggplot(mycounts, aes(x=Value, y=ecd))
This is taking ages to plot. I was wondering if there is a clean way to plot only a sample of this dataset (say, every 10th point or 50th point) without compromising on the actual result?
I am not sure about your data structure, but a simple
samplecall might be enough: