I have a problem and I would like to apologize, if this problem has been already discussed, eventhough I checked old postings.
I have a data.frame with 2 columns, the second column can contain several identifier, but the number can vary. In another data.frame the identifiers corresponds to another identifier.
df.1
color identifier
blue A1, B2, C3, C4
yellow B2, C4, C6
green A3
df.2
A1 Mercedes
A3 BMW
B2 Porsche
C3 Toyota
C4 Hundai
C5 Volkswagen
C6 Peugeot
What I would like to have is a data.frame like this:
df.3
color identifier identifier2
blue A1, B2, C3, C4 Mercedes, Porsche, Toyota, Hundai
yellow B2, C4, C6 Porsche, Hundai, Peugeot
green A3 BMW
A data.frame which contains the identifiers and additionally the identifiers of the second data.frame.
I was trying to use apply and stack and unstack, but I was not successful at all.
Do you have any suggestions?
Here is another solution, using
strsplit:Note that
identifierandidentifier1are now lists in yourdata.frame. I personally find this easier to work with later on.You may need to modify the
strsplitif there is any whitespace left over, but it works with this sample data. Also, forstrsplitto work, the data need to be in modeas.character(hence my use ofstringsAsFactorswhen reading in the data).Update: write.table()
I do prefer to keep the data in lists in case I wanted to do further analysis. However, if the data are complete or just for output purposes, you might want to do something like this:
This will allow you to use
write.tablesinceidentifierandidentifier1are now of modecharacterinstead oflist.