I have a dataset that contains observations for every second of four consecutive days

Question

0

Asked: June 12, 20262026-06-12T04:18:41+00:00 2026-06-12T04:18:41+00:00

I have a dataset that contains observations for every second of four consecutive days

0

I have a dataset that contains observations for every second of four consecutive days (roughly 340’000 data points). This is too much to display in a scatter plot. I would like to plot only a uniform sample of, say, 2000 time points.

Is it possible to achieve this with ggplot2‘s “grammar of graphics” approach? I haven’t found any built-in “sampling” modifier, but perhaps it’s easy enough to write one?

library(ggplot2)

x <- 1:100000
d <- data.frame(x=x, y=rnorm(length(x)))
ggplot(d[sample(x, 2000), ], aes(x=x, y=y)) + geom_point()

This is how it can be “hacked” by modifying the data passed to ggplot. But I don’t want to modify the data, just filter it to include only a sample.

ggplot(d, aes(x=x, y=y)) + ??? + geom_point()

EDIT: I’m specifically looking for sampling, not smoothing or binning. The data I have shows the time it takes to simulate one second of a specific process. The simulation has been parallelized, and for each simulated seconds I have the run times for each of the cores involved (8 in total). I want to show sub-optimal load balancing by plotting just the raw data points. The reason for the sampling is just that 300’000 data points are way too much for a scatter plot: Plotting takes too long and the visualization is no good.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T04:18:43+00:00

Editorial Team

2026-06-12T04:18:43+00:00Added an answer on June 12, 2026 at 4:18 am

You can subset with in the geom_point call using the data argument:

... + geom_point(data=d[sample(x,2000),])

This way, you are free to add other geoms using all the data, eg, using the example data:

ggplot(d, aes(x=x, y=y)) + geom_hex() + geom_point(data=d[sample(x,2000),])

hexbin and sampled points

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataset that contains observations for every second of four consecutive days

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply