I am new to R but not to programming in general an yet I am stuck on the above question. I have a large .csv file which contains all of the options data for the years 2006-2011. I have successfully loaded that large file into a Data Frame. However, it is the next step for which I am struggling. I need to split this data frame in ‘n’ number of data frames where ‘n’ corresponds the number of individual options contract contained in the larger data frame. So for example if my original data frame contained the daily Price of the 1280 Call Option that expires in a month along with the daily price of the 1290 Call Options that expires in a month, I would like to wind up with two separate data frames. Below is the result of a str() of my large data frame
'data.frame': 2215636 obs. of 21 variables:
$ symbol : chr "SPX" "SPX" "SPX" "SPX" ...
$ exchange : chr "CBOE" "CBOE" "CBOE" "CBOE" ...
$ date : Date, format: "2006-01-03" "2006-01-03" "2006-01-03" "2006-01-03" ...
$ adjusted.stock.close.price: num 1269 1269 1269 1269 1269 ...
$ option.symbol : chr "JXAAF" "JXAMF" "JXAAI" "JXAMI" ...
$ expiration : Date, format: "2006-01-06" "2006-01-06" "2006-01-06" "2006-01-06" ...
$ strike : int 1230 1230 1245 1245 1260 1260 1275 1275 1290 1290 ...
$ call.put : chr "C" "P" "C" "P" ...
$ ask : num 40.1 0.25 25.4 0.7 12 2.45 3.1 9.3 0.55 22.2 ...
$ bid : num 38.1 0.05 23.4 0.2 10.5 1.95 2.45 8.3 0.05 20.2 ...
$ mean.price : num 39.1 0.15 24.4 0.45 11.25 ...
$ iv : num 0.13 0.128 0.13 0.128 0.13 ...
$ volume : int 10 76 37 145 292 62 113 55 0 5 ...
$ open.interest : int 226 762 39 125 482 404 72 1 203 200 ...
$ stock.price.for.iv : num 1269 1269 1269 1269 1269 ...
$ X. : chr "*" "*" "*" "*" ...
$ delta : num 0.99725 -0.00236 0.95624 -0.04179 0.73911 ...
$ vega : num 0.00886 0.00807 0.10122 0.09776 0.35569 ...
$ gamma : num 0.00057 0.00052 0.0065 0.00636 0.02286 ...
$ theta : num -0.1076 -0.0188 -0.3262 -0.2268 -0.9153 ...
$ rho : num 0.09134 -0.00022 0.08856 -0.00397 0.06901 ...
head(Sample.DS)
symbol exchange date adjusted.stock.close.price option.symbol expiration strike call.put ask bid
1 SPX CBOE 2006-01-03 1268.8 JXAAF 2006-01-06 1230 C 40.10 38.10
2 SPX CBOE 2006-01-03 1268.8 JXAMF 2006-01-06 1230 P 0.25 0.05
3 SPX CBOE 2006-01-03 1268.8 JXAAI 2006-01-06 1245 C 25.40 23.40
4 SPX CBOE 2006-01-03 1268.8 JXAMI 2006-01-06 1245 P 0.70 0.20
5 SPX CBOE 2006-01-03 1268.8 JXAAL 2006-01-06 1260 C 12.00 10.50
6 SPX CBOE 2006-01-03 1268.8 JXAML 2006-01-06 1260 P 2.45 1.95
mean.price iv volume open.interest stock.price.for.iv X. delta vega gamma theta rho
1 39.10 0.1298 10 226 1268.75 * 0.99725 0.00886 0.00057 -0.10765 0.09134
2 0.15 0.1283 76 762 1268.75 * -0.00236 0.00807 0.00052 -0.01883 -0.00022
3 24.40 0.1298 37 39 1268.75 * 0.95624 0.10122 0.00650 -0.32616 0.08856
4 0.45 0.1283 145 125 1268.75 * -0.04179 0.09776 0.00636 -0.22676 -0.00397
5 11.25 0.1298 292 482 1268.75 0.73911 0.35569 0.02286 -0.91528 0.06901
6 2.20 0.1283 62 404 1268.75 -0.25833 0.35397 0.02302 -0.81108 -0.02458
so maybe a better way of putting it is I need to split the data frame by the unique combination of option.symbol, strike, call.put, and expiration. It would seem that I might be able to use a for each loop but I have been told that looping should be avoided in R and have been pointed in the lapply direction.
From a pseudo-code perspective here is how I was trying to solve this issue:
- Load large data-set
- Create a matrix/vector/list/data frame (not sure which one to use) which hold the different unique combinations of the option.symbol, strike, call.put, and expiration’s
- For Each Item in the above object query the large data frame for matches
store the result as a data frame contained in a list - end result is a list containing a bunch of data.frames
- serialize the list via the saveRDS function so I never have to do this again.
I am familiar with the subsetting functions such as
X<- Options.DF.List[[1]][ which(Options.DF.List[[1]]$date %in% SPX.Put.Purchase.Dates), ]
but I am unsure of how to expand upon that type of syntax to accomplish my goals. Thanks in advance.
You can use
dlplyfrom theplyrpackage, it will return a list of data.frames: