If I have this data set
Browser Count
Chrome/11 100
Chrome/11 89
Chrome/13 10
Safari/12 40
Safari/114 30
And I want to get a more general form of the browser without the version number.
Browser Clean_Browser Count
Chrome/11 Chrome 100
Chrome/11 Chrome 89
Chrome/13 Chrome 10
Safari/12 Safari 40
Safari/114 Safari 30
I know this is easy to do with python or excel, but is there a way to do it in R so I don’t have to pre-process the data?
That is pretty straightforward thanks to the regular expressions as well as string processing — both are vectorised so you do not need to loop. You could use
gsub()et al and replace ‘/…’ with blankseven use
strsplitwith ‘/’ as the split character and retain the firstcertainly other ways I can’t think of now, and experience suggests several will involve packages by Hadley 🙂 [kidding aside, look at the
stringrpackage too]Here is approach one, done on a vector but a column in a data.frame is just the same: