I’d like to ask a follow-up question to this issue, please, because an additional problem arose: I discovered subjects (Cultural Studies, e.g.) which belong to more than one category (Arts & Humanities and Social Sciences), i.e. there is overlap which has to be considered.
I have long lists of categories such as this machine readable example:
AB <- c("Science","Arts & Humanities","Arts & Humanities; Social Sciences","Science","Arts & Humanities; Arts & Humanities; Social Sciences","Science","Science; Social Sciences","Social Sciences; Science")
So it looks like this:
> AB
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Social Sciences; Science"
I would like to edit these terms and eliminate duplicates in order to get this result:
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Science; Social Sciences"
So I’m looking for another loop to eliminate the duplicate in #5. I tried using strsplit() and unique() but this didn’t work:
> unique(strsplit(AB, "; *"))
[[1]]
[1] "Science"
[[2]]
[1] "Arts & Humanities"
[[3]]
[1] "Arts & Humanities" "Social Sciences"
[[4]]
[1] "Arts & Humanities" "Arts & Humanities" "Social Sciences"
[[5]]
[1] "Social Sciences" "Science"
So I would like to ask you again, please: How can I achieve the correct output mentioned above?
Thank you very much in advance for your consideration!
I think it has to do with a trailing or leading white space. If you apply this to AB it will take care of this for you:
Yielding: