I’m trying to develop a function which will allow me to input new
elements to a data frame and then check if they contain certain
words.
df <- data.frame(keyword=c("He drives a Honda", "He goes to Ohio State"),
car=c(1,0), school=c(0,1))
df
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
In this data frame, car and school are binary values which contain 1 if a word from the car/school vector is part of the keyword. If a word isn’t present in the keyword, then 0 is assigned.
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
I want to use a function to input new keywords into the data frame, while iterating over the keywords for specific values from the car and school vectors.
main <- function(keyword){
n = strsplit(as.character(keyword), " ")[[1]]
for( i in keyword ){
if( any(n==car) ){
df$car <- c(1)
}
if( any(n==school )){
df$school <- c(1)
}
}
}
This function isn’t complete and it produces the following error. Because the car and school vectors are of length 3, it seems to be producing an error.
> main("He likes Ford and goes to Ohio State")
Warning message:
In n == school :
longer object length is not a multiple of shorter object length
I’m also not sure how to add the 0/1 values to the df. For the “He likes Ford and goes to Ohio State” keyword, I should have 1 in both the car and school columns.
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
He likes Honda and goes to Ohio State 1 1
Please help.
It seems like the ifelse() function would be really useful for this task, but I haven’t been able to properly implement it.
I think the easiest way is to use a compound regular expression: