I have a problem and I would like to apologize, if this problem has

Question

0

Asked: June 8, 20262026-06-08T20:49:45+00:00 2026-06-08T20:49:45+00:00

I have a problem and I would like to apologize, if this problem has

0

I have a problem and I would like to apologize, if this problem has been already discussed, eventhough I checked old postings.

I have a data.frame with 2 columns, the second column can contain several identifier, but the number can vary. In another data.frame the identifiers corresponds to another identifier.

df.1  

color   identifier
blue    A1, B2, C3, C4 
yellow  B2, C4, C6
green   A3

df.2

A1 Mercedes
A3 BMW
B2 Porsche
C3 Toyota
C4 Hundai
C5 Volkswagen
C6 Peugeot

What I would like to have is a data.frame like this:

df.3

color   identifier        identifier2
blue    A1, B2, C3, C4    Mercedes, Porsche, Toyota, Hundai 
yellow  B2, C4, C6        Porsche, Hundai, Peugeot
green   A3                BMW

A data.frame which contains the identifiers and additionally the identifiers of the second data.frame.

I was trying to use apply and stack and unstack, but I was not successful at all.

Do you have any suggestions?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T20:49:47+00:00

Here is another solution, using strsplit:

# The data
df.1  = read.table(header=TRUE, text="
color   identifier
blue    'A1, B2, C3, C4'
yellow  'B2, C4, C6'
green   'A3'", stringsAsFactors = FALSE)

df.2 = read.table(header=FALSE, text="
A1 Mercedes
A3 BMW
B2 Porsche
C3 Toyota
C4 Hundai
C5 Volkswagen
C6 Peugeot", stringsAsFactors=FALSE)
names(df.2) = c("identifier", "car")

df.1$identifier = strsplit(df.1$identifier, split=", ")
df.1$identifier1 = lapply(1:nrow(df.1), 
         function(x) df.2[which(df.2$identifier %in% df.1$identifier[[x]]), 2])
df.1
#    color     identifier                       identifier1
# 1   blue A1, B2, C3, C4 Mercedes, Porsche, Toyota, Hundai
# 2 yellow     B2, C4, C6          Porsche, Hundai, Peugeot
# 3  green             A3                               BMW

Note that identifier and identifier1 are now lists in your data.frame. I personally find this easier to work with later on.

str(df.1)
# 'data.frame':  3 obs. of  3 variables:
#   $ color      : chr  "blue" "yellow" "green"
# $ identifier :List of 3
#  ..$ : chr  "A1" "B2" "C3" "C4"
#  ..$ : chr  "B2" "C4" "C6"
#  ..$ : chr "A3"
# $ identifier1:List of 3
#  ..$ : chr  "Mercedes" "Porsche" "Toyota" "Hundai"
#  ..$ : chr  "Porsche" "Hundai" "Peugeot"
#  ..$ : chr "BMW"

You may need to modify the strsplit if there is any whitespace left over, but it works with this sample data. Also, for strsplit to work, the data need to be in mode as.character (hence my use of stringsAsFactors when reading in the data).

Update: write.table()

I do prefer to keep the data in lists in case I wanted to do further analysis. However, if the data are complete or just for output purposes, you might want to do something like this:

df.3 = df.1
df.3$identifier = sapply(df.3$identifier, paste0, collapse=", ")
df.3$identifier1 = sapply(df.3$identifier1, paste0, collapse=", ")

This will allow you to use write.table since identifier and identifier1 are now of mode character instead of list.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem and I would like to apologize, if this problem has

Leave an answerCancel reply

1 Answer

Update: write.table()

Leave an answer
Cancel reply