here’s the issue
I have a csv file which I read.csv(), and prints to this:
grp b a id d c
1 grp1 2 1 id3 3 2
2 grp1 -2 1 id1 3 2
3 grp0 -2 1 id4 3 2
4 grp0 1 1 id0 3 2
5 grp0 1 1 id2 3 2
Now I want to split this into two dataframes, one with data for grp1 and another for grp2
groups <- split(raw, raw$grp);
Which yields this:
$grp0
grp b a id d c
3 grp0 -2 1 id4 3 2
4 grp0 1 1 id0 3 2
5 grp0 1 1 id2 3 2
$grp1
grp b a id d c
1 grp1 2 1 id3 3 2
2 grp1 -2 1 id1 3 2
Now I just want the a,b,c,d rows from each of these lists, so I need to coerce them to dataframes to use subset(), which means I need to:
for(i in 1:length(groups))
{
x <- subset(as.data.frame(groups[i]), select = c(a,b,c,d));
some_function(x);
}
Problem is, when I do this, it says column a does not exist, and when I print this stuff out, this is what we see:
grp0.grp grp0.b grp0.a grp0.id grp0.d grp0.c
3 grp0 -2 1 id4 3 2
4 grp0 1 1 id0 3 2
5 grp0 1 1 id2 3 2
grp1.grp grp1.b grp1.a grp1.id grp1.d grp1.c
1 grp1 2 1 id3 3 2
2 grp1 -2 1 id1 3 2
So these columns are no longer just a,b,c,d but their name prepended by the name created during the split. Is their a way I can avoid this happening? Or is there a way to get the name of the dataframe and prepend it to the list of elements I am subsetting? I just want to end up with dataframes that look something like this, the column names need not be exact:
$grp0
b a d c
3 -2 1 3 2
4 1 1 3 2
5 1 1 3 2
$grp1
b a d c
1 2 1 3 2
2 -2 1 3 2
In the future, please use dput() on any object that you want to present to us, so we don’t have to manually type in your example to write the code.
Are you a SAS programmer? You don’t need semicolons after every line…
You were actually pretty close.
I’d recommend reading up on the difference between
?`[`and?`[[`. Here’s your corrected code:With that said, please try to learn the ?lapply functions