I wasn’t sure how to word this correctly while searching, so sorry if this has a simple answer.
I have 58 dataframes with ~25,000 rows each that I am getting from .csv’s. They look something like this:
Probe.Id Gene.Id Score.d
1418126_at 6352 28.52578
145119_a_at 2192 24.87866
1423477_at NA 24.43532
1434193_at 100506144///9204 6.22395
Ideally I want to split the Ids at the “///”s and get them on new rows. Like so:
Probe.Id Gene.Id Score.d
1418126_at 6352 28.52578
145119_a_at 2192 24.87866
1423477_at NA 24.43532
1434193_at 100506144 6.22395
1434193_at 9204 6.22395
Using strsplit allows me to get Gene.Id as a list of character vectors, but once I have that I’m not sure what the most effective way is to get each of the individual ids on their own row with the correct values from the other columns. Ideally I don’t want to just be looping through 25,000 rows.
If anyone knows the right way to do this I’d super appreciate it.
EDIT: I should’ve added that there’s a complicating factor in that there are rows which have ids like so:
333932///126961///653604///8350///8354///8355///8356///8968///8352///8358///8351///8353///8357"
and I have no idea what the maximum number of ids in a row is.
Edit: New solution after OP’s comment. Very straightforward using
data.table:You could add
fixed = TRUEto thestrsplitexpression to speedup further, if///is a fixed pattern.Alternative Again using
data.table. Taking into consideration thatstrsplitis a vectorised operation and that running it on the whole ofGene.Idcolumn would be much faster than running it thro’ 1 row at a time (even thoughdata.tableruns thro’ very fast, you could get more speedup by splitting the previous code into 2 steps:I just replicated the
data.tableshown in this example many times until I got295245rows. And then I ran a benchmark usingrbenchmark:For this example, you get about 1.6 times faster. But this depends on the number of entries with
///. Hope this helps.OLD solution: (for continuity)
One way is to: 1)
find the positionswhere this///occurs, 2)extract, 3)duplicate, 4)suband 5)combinethem.