I am looking for some ideas/functions to improve a task I coded in a quite inefficient way.
My originial data frame look like:
CUSIP Date Pclose TRI
1 30161N101 2011-01-03 38.8581 2011-01-03
2 06738G878 2011-01-03 48.4040 2011-01-03
3 74339G101 2011-01-03 24.0880 2011-01-03
4 74348A590 2011-01-03 81.7200 2011-01-03
5 26922W109 2011-01-03 87.8700 2011-01-03
...
The column ‘TRI’ contains the Date in a date format.
And what I want to obtain look like:
Date 233052109 126650100 251566105 149123101 ...
1 2011-01-03 22.8031 34.3034 11.2645 91.6178
2 2011-01-04 22.6843 34.2740 11.1862 91.1897
3 2011-01-05 22.7933 34.6362 10.9948 91.9779
4 2011-01-06 22.8034 34.2838 11.0470 91.0242
5 2011-01-07 22.6248 34.3034 11.0644 91.2091
.
.
.
In the second data frame each column (except the date one) has the name of a CUSIP from the first data frame and is filled with data from Pclose.
I am making my second data frame with regular loop but I am sure there is a way with compilled function do to way better (perhaps subset)
my functions are:
To build the second data frame:
function(cusippresent,cusiplist){
workinglist=list()
workinglist[1]=as.character('Date')
position = 2
for (i in 2:length(cusippresent)) {
if(cusippresent[i] %in% cusiplist) {
workinglist[position]=as.character(cusippresent[i]);
position=position+1
}
}
rm(position)
Data=data.frame()
#On remplit la première ligne du dataframe
#avec des éléments du bon type sinon il y a des problèmes
Data[1,1]=as.character('1987-11-12')
Data[1,2]=as.numeric(1)
for (i in 2:length(workinglist)){Data[1,i]=as.numeric(1)}
for (i in 1:length(workinglist)){colnames(Data)[i]=workinglist[i]}
return(Data)
}
To fill the data frame:
function(DATA11C,TCusipC){
nloop = 1
positionorigine = 1
positioncible = 1
#copy of dates
datelist = DATA11C[,"Date"]
datelist = unique(datelist)
for (i in 1:length(datelist)) {
TCusipC[i,"Date"]=as.character(datelist[i])
}
#creation of needed columns
TCusipC[,ncol(TCusipC)+1] = as.Date(TCusipC[,"Date"])
colnames(TCusipC)[ncol(TCusipC)] = 'TRI'
#ordering of tables
DATA11C=DATA11C[with(DATA11C,order(TRI)),]
TCusipC=TCusipC[with(TCusipC,order(TRI)),]
longueur = nrow(TCusipC)
#filling of the table
while(nloop<longueur) {
while(DATA11C[positionorigine,"TRI"]==TCusipC[positioncible,"TRI"]){
nom = as.character(DATA11C[positionorigine,"CUSIP"]);
TCusipC[positioncible,nom]=as.numeric(DATA11C[positionorigine,"Pclose"]);
positionorigine=positionorigine+1
};
nloop=nloop+1;
positioncible=positioncible+1
}
return(TCusipC)
}
Any suggestion on what function to dig up? potential improvement?
Many thanks,
Vincent
That is exactly what the reshape packages does