I am looking to use a function to speed up a data cleaning process. In the example shown I am looking to remove values reported in the am and pm columns if the “.no” column for that day has a value of 1.
df1 = data.frame (identifier = c(1:4),
mon.no = c(1,NA,NA,NA),mon.am = c(2,1,NA,3),mon.pm = c(3,4,NA,5),
tues.no = c(NA,NA,1,NA),tues.am = c(2,3,1,4),tues.pm = c(3,3,2,3))
I envisage using a function uses the day to clean the data:
clean1 = function (day) {
df1$day.am[df1$day.no==1] = NA
df1$day.pm[df1$day.no==1] = NA
return (df1)}
df2 = clean1(mon)
However this returns the following error.
Error in `$<-.data.frame`(`*tmp*`, "day.am", value = logical(0)) :
replacement has 0 rows, data has 4
I assume that this is because the function expects a full column name and cannot fill in the gaps around a text input? Is it possible to use a function in that way?
Having read these notes I think that it would be better practice to have my data in a tidy format and am working on a solution which involves reorganising my data. However it would also be handy to be able to do this while the data is in it’s original format.
Thanks.
You’re really close. @Tyler Rinker in comments has explained why it doesn’t work. Here’s a fix:
Somebody else might offer more efficient ways of doing this. Note that you’re only ever working from your original
df1here. If you now runyou won’t get a dataframe with both days cleaned. You could fix this by supplying the dataframe to be acted on to the function too: