I have a question about removing non-alphanumeric characters from a list in R. I have a list will all sorts of odd characters, blanks, etc. and would like to remove them. I’m generally able to remove what I want using the tm package in r. I fiddled around with it, but got nowhere so thought going back to the list may be the place to start.
The list:
list("\n \n", "\n\n ", "\n ", " ", "\n ",
"\n \n ", "\n ", "Home", "\n", "Expertise",
"Question & Research Design", "\n", "Survey Development & Validation",
"\n", "Data Processing", "\n", "Statistical Analysis", "\n",
"Publications & Grants", "\n", "Evaluation", "\n", "\n",
"Consulting Areas", "Business", "\n", "Education", "K-12",
"\n", "Â ", " Â Â Â Â", " | ")
The expected output
[1] "" "" ""
[4] "" "" ""
[7] "" "Home" ""
[10] "Expertise" "Question Research Design" ""
[13] "Survey Development Validation" "" "Data Processing"
[16] "" "Statistical Analysis" ""
[19] "Publications Grants" "" "Evaluation"
[22] "" "" "Consulting Areas"
[25] "Business" "" "Education"
[28] "K12" "" ""
[31] "" ""
Strongly recommend you simply use
where x is the name of the list.
You probably included the foreign characters at the end of the list because you want these obliterating too – well, the above command achieves this. To explain briefly, the square brackets in the command define a collection of symbols, and the ^ symbol means “not”, so everything that is not in the specified set of 62 characters (lower case a to z, upper case A to Z, and digits 0 to 9) will be replaced by the empty string “” (i.e. destroyed).
And here’s the output…