I am working on a large dataset, an example of which is shown below:
Df1 <- data.frame(ID = c(1:7),
home_pc = c("VB2 4RF","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),
start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),
end_pc = c(NA,"CB5 4FG","Home","Home","Home","GH6 8HG",NA))
I want to do two things:
- Firstly, delete rows which have an NA in the columns “start_pc” and “end_pc”.
- When “Home” is written in either the “start_pc” or “end_pc” columns, I want to be able to replace this with the postcode in “home_pc”.
How is best to tackle this problem – could anyone give me any ideas how best to do this?
Many thanks.
okay here’s one starting point – others will surely give you more elaborate answers.
First, getting rid of NA values:
this will do the job for all columns in the
data.frameobjectSecond, replacing the start and end columns. try the
ifelse()function which is vectorised:hope i understood your question correctly! Some additional comments: if you want to prove if something is NA (e.g. within the
ifelse()function) useis.na()the opposite is!is.na(). You may also build subsets of the dataframe with this:subset(Df1, !is.na(home_pc))should work for example. Of course check out the help file for all these functions if you need some more hints:?ifelseor?subsetetc.