I am having trouble figuring out exactly how to approach this problem and am hoping someone has an idea. For the following code, I want the result given below. The result are based on the following criteria:
Within the same date:
idEffectA:
Always 0 for rows with id A
1 for “non id A” rows if there is any id A row with a 1 in special for that date
0 for “non id A” rows if all id A rows have a 0 in special for that date
sizeEffect10:
Always 0 for rows with size 10
1 for “non size 10” rows if there is any size 10 row with a 1 in special for that date
0 for “non size 10” rows if all size 10 have a 0 in special for that date
Similarly for other variables. If the names of the columns could also be generated, that would be extremely helpful rather than having to define each one by hand. Also, in the actual data set there are many different categories for size and id, so avoiding hand inputting those into a function would be best, though if that is the only possibility, an aggregate function used with a merge could be used on the data set with the defined function (any other suggestions?). I would like the result columns to be bound to the original data set.
Please let me know if there are any questions as I had a hard time actually defining what I was wanting in the columns to begin with. I have tried using the plyr package along with indexing but haven’t gotten very far. Thanks!
For the first part, I think something like this could work within a loop:
i=0
ifelse(id==A & max(special[id=="A" & date==min(date)+i], 1, 0)
i=i+7
but, after that I’m not really sure…
original.data
label date special size id
1 1/11/2012 0 10 A
2 1/11/2012 1 20 A
3 1/11/2012 0 10 B
4 1/11/2012 0 30 C
5 1/11/2012 0 10 C
1 1/18/2012 0 10 A
2 1/18/2012 0 20 A
3 1/18/2012 0 10 B
4 1/18/2012 1 30 C
5 1/18/2012 1 10 C
1 1/25/2012 1 10 A
2 1/25/2012 1 20 A
3 1/25/2012 0 10 B
4 1/25/2012 1 30 C
5 1/25/2012 1 10 C
1 2/1/2012 0 10 A
2 2/1/2012 1 20 A
3 2/1/2012 1 10 B
4 2/1/2012 0 30 C
5 2/1/2012 0 10 C
1 2/8/2012 0 10 A
2 2/8/2012 0 20 A
3 2/8/2012 0 10 B
4 2/8/2012 1 30 C
5 2/8/2012 0 10 C
1 2/15/2012 1 10 A
2 2/15/2012 1 20 A
3 2/15/2012 0 10 B
4 2/15/2012 1 30 C
5 2/15/2012 0 10 C
Here are the results I am looking for:
results
idEffectA sizeEffect10 idEffectB sizeEffect20 idEffectC sizeEffect30
0 0 0 1 0 0
0 0 0 0 0 0
1 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 0 0
0 0 0 0 1 1
0 1 0 0 1 1
0 0 0 0 1 1
0 1 0 0 0 0
0 0 0 0 0 1
0 0 0 1 1 1
0 1 0 0 1 1
1 0 0 1 1 1
1 1 0 1 0 0
1 0 0 1 0 1
0 0 1 1 0 0
0 1 1 0 0 0
1 0 0 1 0 0
1 1 1 1 0 0
1 0 1 1 0 0
0 0 0 0 1 1
0 0 0 0 1 1
0 0 0 0 1 1
0 0 0 0 0 0
0 0 0 0 0 1
0 0 0 1 1 1
0 1 0 0 1 1
1 0 0 1 1 1
1 1 0 1 0 0
1 0 0 1 0 1
Ok, here is the beginning to what I have:
x <- rep(0, length(id))
i=min(date)
n=1
id.level = 1
for(i in min(date):max(date)){
for(id.level in 1:length(levels(id))){
for(n in 1:length(id)){
x[n] <- ifelse(max(special[id==id[id.level] & date==i])==1, 0, 1)
n=n+1
}
y <- paste("idEffect",id[id.level])
id.level = id.level + 1
colnames(x)[length(colnames(x))] <- y
}
i=i+7
}
What I was hoping to do with this code was create the 0’s for all id’s such as I wrote in the original post:
idEffectA:
Always 0 for rows with id A
and create a column for each separate id. However, I don’t know what to do for the other condition or how to fit them in. Should I add in nested ifelse statements? Any help greatly appreciated. I’m more used to working with indexing in R so my knowledge of ifelse and for statements is fuzzy. Thanks.
First, get the data into a reproducible form (I used
dput()onoriginal.datathat I read in from what you had and did some conversions on). This makes sure that all the data types are right (dateis of typeDateandidis afactor, etc.; these things are relevant.)Next, you want to do processing “Within the same date”. This implies a split-apply-combine strategy. The
plyrlibrary handles this well.You want
ddply(data.framein,data.frameout) and you need a function that does your transformations for a subset of the data which corresponds to a single date.Using your first two examples (idEffectA and sizeEffect10), implementing your rules would look like:
For
idEffectA,otherschecks if any of thespecialvalues corresponding to a value ofAare1, and is1if so,0otherwise. The assignment toidEffectAthen is conditional on whether it isA(0), or not (whateverotherswas determined to be). Repeat logic for size, but comparing to a number.Your further example indicates that you want a column for each possible value of
idandsize. Loops can get you there.which gives
I pre-compute the possible values of
idandsize(allidandallsize) since that only needs to be done once. Within the function for each date, iterate over each possible value ofid. Build the “Effect” columns the same way, but usingpasteto create the column name.