This problem has plagued me for quite some time. I always just work around it with a for loop, but I think it’s finally time for me to find a quicker and more elegant way of doing this.
As an example, let’s say I have a data frame containing information on whether an item is red or blue. The information is presented in this way:
item.df <- data.frame(Item=seq(1,5), Red=c("Y", "Y", "N", "N", "N"), Blue=c("N", "N", "Y", "Y", "N"))
Clearly, this is not the most condensed way to represent this information. Instead of having two separate Red and Blue columns, I simply want one item color column that would contain “Red”, “Blue”, or “Neither” (Or NA would also be acceptable).
Obviously, I can achieve this by creating an empty Item.Color column and then filling it in by looping through each individual row. But I’m sure there is a quicker way to do this.
Back when I was a true R novice, I tried to do it by:
item.df$Item.Color <- if(item.df$Red=="Y"){"Red"}
but I quickly learned this doesn’t work, because the if statement will only read the first element in item.df$Red.
Could there be a way to achieve this using do.call() or one of the apply() functions? I’ve attempted, but I could never get it to do quite what I wanted. Thanks in advance for any insight you might be able to provide!
p.s. I would also be grateful to hear any suggestions for a better title for this question. For me, that always seems to be the hardest part in asking questions.
The following code should do the trick, it even checks if the data contains rows where both
RedandBlueare TRUE (== "Y").The trick here is that the same syntax for taking a subset can be used for assignment: