Background: I’m using some Census public use microdata samples (the American Community Survey in particular) across several years to examine the behavior of people who have completed different degrees (e.g., high school diploma, bachelor’s degree, master’s degree). The variable with that public use file is called "Schooling". The problem is that the codes that are contained within the variable "Schooling" have changed from year to year. For example, for the files up through 2007, a value of "13" reflects completing a bachelor’s degree, but starting in 2008 the value changes to "21" when someone has completed their bachelor’s degree.
Goal: To create a new "Degree Competed" variable that translates the "Schooling" codes to reflect the degree level completed, taking into account the year of the file.
Logistics: The files for all years have been concatenated and, for review purposes, I have to work with the file as is rather than correcting it before it gets to this point.
Existing Code: Here is what I tried.
if (original.file$year %in% c(2000,2001)) {
if (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 10) {original.file$degree.completed <- 1}
else if (original.file$Schooling <= 12) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
}
else if (original.file$year %in% c(2002,2003,2004,2005,2006,2007)) {
if (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 11) {original.file$degree.completed <- 1}
else if (original.file$Schooling == 12) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
}
else if (original.file$year %in% c(2008,2009,2010,2011)) {
if (original.file$Schooling <= 15) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 19) {original.file$degree.completed <- 1}
else if (original.file$Schooling == 20) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 21) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 22) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 23) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 24) {original.file$degree.completed <- 6}
}
Problem: I get the following warning messages of this type.
Warning messages:
1: In if (original.file$year %in% c(2000, 2001)) { : the condition has length > 1 and only the first element will be used
2: In if (original.file$Schooling <= 8) { : the condition has length > 1 and only the first element will be used
3: In if (original.file$Schooling <= 10) { : the condition has length > 1 and only the first element will be used
Question: I know that there is a vector vs scalar issue here with the "if", as I’ve seen from other questions on StackOverflow, but the answers do not seem to apply to this situation. What is the solution here?
First, use
cutor atableinstead of all thoseif‘s andelse‘s:You will also want to
cutthe years into factors.Then you should be able to do: