I am having some trouble with leading and trailing white space in a data.frame.
For example, I look at a specific row in a data.frame based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
I was wondering why I didn’t get the expected output since the country Austria obviously existed in my data.frame. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
All I have changed in the command is an additional white space after Austria.
Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame uses "Austria " while the other frame has "Austria". The matching doesn’t work.
- Is there a nice way to ‘show’ the white space on my screen so that I am aware of the problem?
- And can I remove the leading and trailing white space in R?
So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.
Probably the best way is to handle the trailing white spaces when you read your data file. If you use
read.csvorread.tableyou can set the parameterstrip.white=TRUE.If you want to clean strings afterwards you could use one of these functions:
To use one of these functions on
myDummy$country:To ‘show’ the white space you could use:
which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.