I’d like to ask a follow-up question to this issue , please, because an

Question

0

Asked: June 13, 20262026-06-13T08:08:28+00:00 2026-06-13T08:08:28+00:00

I’d like to ask a follow-up question to this issue , please, because an

0

I’d like to ask a follow-up question to this issue, please, because an additional problem arose: I discovered subjects (Cultural Studies, e.g.) which belong to more than one category (Arts & Humanities and Social Sciences), i.e. there is overlap which has to be considered.

I have long lists of categories such as this machine readable example:

AB <- c("Science","Arts & Humanities","Arts & Humanities; Social Sciences","Science","Arts & Humanities; Arts & Humanities; Social Sciences","Science","Science; Social Sciences","Social Sciences; Science")

So it looks like this:

> AB  
[1] "Science"                                               "Arts & Humanities"  
[3] "Arts & Humanities; Social Sciences"                    "Science"  
[5] "Arts & Humanities; Arts & Humanities; Social Sciences" "Science"  
[7] "Science; Social Sciences"                              "Social Sciences; Science"

I would like to edit these terms and eliminate duplicates in order to get this result:

[1] "Science"                                    "Arts & Humanities"  
[3] "Arts & Humanities; Social Sciences"         "Science"  
[5] "Arts & Humanities; Social Sciences"         "Science"  
[7] "Science; Social Sciences"                   "Science; Social Sciences"

So I’m looking for another loop to eliminate the duplicate in #5. I tried using strsplit() and unique() but this didn’t work:

> unique(strsplit(AB, "; *"))  
[[1]]  
[1] "Science"  

[[2]]  
[1] "Arts & Humanities"  

[[3]]  
[1] "Arts & Humanities" "Social Sciences"  

[[4]]  
[1] "Arts & Humanities" "Arts & Humanities" "Social Sciences"  

[[5]]  
[1] "Social Sciences" "Science"

So I would like to ask you again, please: How can I achieve the correct output mentioned above?
Thank you very much in advance for your consideration!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T08:08:29+00:00

I think it has to do with a trailing or leading white space. If you apply this to AB it will take care of this for you:

fun <- function(text.var){
    x <- unlist(strsplit(text.var, ";"))
    Trim <- function(x) gsub("^\\s+|\\s+$", "", x)
    paste(sort(unique(Trim(x))), collapse="; ")
}

sapply(AB, fun, USE.NAMES = FALSE)

Yielding:

> sapply(AB, fun, USE.NAMES = FALSE)
[1] "Science"                            "Arts & Humanities"                 
[3] "Arts & Humanities; Social Sciences" "Science"                           
[5] "Arts & Humanities; Social Sciences" "Science"                           
[7] "Science; Social Sciences"           "Science; Social Sciences"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’d like to ask a follow-up question to this issue , please, because an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply