I have a frame with many columns
c1 c2 c3 c4 … c30 d
I want to aggregate by and find all rows unique in c1..30 and then get the min(d) for that row. In sql this would be a group by c1, …, c30.
d is of type date.
I have found some solutions here in stack but none seem to work for 1) so many columns 2) do a min instead of sum.
Any input would be great.
Here’s an answer using the
data.tablepackage with some fake data:Small addition from Matthew :
+10, and nice fake data. Setting a key first so you can do
by=key(DT)can get a bit onerous sometimes, so I’d usually just do an ad hoc by for something like this for simplicity. But, trying the most natural thing first :The error message tells us what we need to do instead :
The natural next thought is of course: well, if data.table is clever enough to know
byis column names and put that in the error message, why can’t it just do it? The answer is that it’s only making a guess based on the data. In some edge cases it’s not so clear. So currently extra intent is needed from user: wrapping witheval. I’m not completely happy with that though, so perhaps we can improve that.EDIT: renaming the new data.table
In my approach, I named the new column
minDwhen I created it by enteringUsing Matthew Dowle’s approach, you would achieve this pretty much the same way by entering
If you’ve already created the column and want to rename it, use
setnamesas follows:This avoids copying the whole
data.tableand preserves the memory over allocation (both of these advantages are lost when usingnames(DT3)<-"something"), as outlined in the documentation under?setnames