I have a problem whereby I want to match the start postcode and end postcode of a very large survey dataset, and put these results in a new dataframe. I have created an example dataframe to use for the purpose of illustration.
ID = c(1,2,3,4,5)
StartPC = c("AF2 4RE","AF3 5RE","AF1 3DR","AF2 4RE","AF2 4PE")
EndPC = c("AF2 4RE","NA","AF2 3DR","AX2 4RE","AF2 4PE")
data<-data.frame(ID,StartPC,EndPC)
data2 <- subset(data, StartPC==EndPC,na.rm=TRUE)
Using the above code, I want to create a dataframe (data2) which only includes the ID numbers whereby the start and end postcodes are the same. However, I get the error message:
Error in Ops.factor(StartPC, EndPC) : level sets of factors are different
The output needs just to have ID numbers 1 and 5 included in the new data table.
That will be because
Your two columns are factors, not characters. Factors are categorical variables, which are stored as integers and a lookup-table of ‘levels’. Comparing them is actually comparing the underlying integers, so R makes sure you are comparing factors with the same levels. If not, then it decides you are doing a bad thing.
So convert to character:
either on the fly like that, or make your data frame with characters in the first place, or make sure both columns are made with the same levels.