I am trying to plot the CDF curve for a large dataset containing about

Question

0

Asked: May 26, 20262026-05-26T08:05:00+00:00 2026-05-26T08:05:00+00:00

I am trying to plot the CDF curve for a large dataset containing about

0

I am trying to plot the CDF curve for a large dataset containing about 29 million values using ggplot. The way I am computing this is like this:

mycounts = ddply(idata.frame(newdata), .(Type), transform, ecd = ecdf(Value)(Value))
plot = ggplot(mycounts, aes(x=Value, y=ecd))

This is taking ages to plot. I was wondering if there is a clean way to plot only a sample of this dataset (say, every 10th point or 50th point) without compromising on the actual result?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T08:05:00+00:00

Editorial Team

2026-05-26T08:05:00+00:00Added an answer on May 26, 2026 at 8:05 am

I am not sure about your data structure, but a simple sample call might be enough:

n <- nrow(mycounts)                              # number of cases in data frame
mycounts <- mycounts[sample(n, round(n/10)), ]   # get an n/10 sample to the same data frame

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to plot the CDF curve for a large dataset containing about

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply