I’ve got some data in JSON format that I want to do some visualization on. The data (approximately 10MB of JSON) loads pretty fast, but reshaping it into a usable form takes a couple of minutes for just under 100,000 rows. I have something that works, but I think it can be done much better.
It may be easiest to understand by starting with my sample data.
Assuming you run the following command in /tmp:
curl http://public.west.spy.net/so/time-series.json.gz \
| gzip -dc - > time-series.json
You should be able to see my desired output (after a while) here:
require(rjson)
trades <- fromJSON(file="/tmp/time-series.json")$rows
data <- do.call(rbind,
lapply(trades,
function(row)
data.frame(date=strptime(unlist(row$key)[2], "%FT%X"),
price=unlist(row$value)[1],
volume=unlist(row$value)[2])))
someColors <- colorRampPalette(c("#000099", "blue", "orange", "red"),
space="Lab")
smoothScatter(data, colramp=someColors, xaxt="n")
days <- seq(min(data$date), max(data$date), by = 'month')
smoothScatter(data, colramp=someColors, xaxt="n")
axis(1, at=days,
labels=strftime(days, "%F"),
tick=FALSE)
You can get a 40x speedup by using
plyr. Here is the code and the benchmarking comparison. The conversion to date can be done once you have the data frame and hence I have removed it from the code to facilitate apples-to-apples comparison. I am sure a faster solution exists.EDIT. MrFlick’s solution leads to an additional 3.5x speedup. I have updated my tests.