I have a very large table with an indexed datetime field. I want to do by group processing on the dataset by month and only output the last observation in each month.
The problem is that it doesn’t contain a month field so I can’t use something like this:
if last.month then do;
output;
end;
Is there a way I can achieve this kind of behaviour without having to add a month field in a previous datastep? The table is 50 gig compressed so I want to avoid any unnecessary steps.
Thanks
You can actually achieve this using ‘by groupformat’ against your original dataset, formatting the datetime field as ‘dtmonyy5.’ As the name implies, this groups by the formatted values instead of the original.
Another method is to use Proc Summary, although this can be memory intensive, particularly against large datasets. Here is the code.
Just a quick note on the previous answer, the ‘month’ function works against date fields, not datetime, so you would need to add the datepart function to the line.