I have over 200k small datasets with the same variables (n<1000 and usually n<100) that I want to concatenate into a master dataset. I have tried using a macro that uses a data step to just iterate through all of the new datasets and concatenate with the master with “set master new:”, but this is taking a really long time. Also, if I try to run at the same time, the call execute data step says that I am out of memory on a huge server box. For reference, all of the small datasets together are just over 5 Gigs. Any suggestions would be greatly appreciated. Here is what I have so far:
%macro catDat(name, nbr) ;
/*call in new dataset */
data new ;
set libin.&name ;
run ;
/* reorder names */
proc sql noprint;
create table new as
select var1, var2, var3
from new;
quit;
%if &nbr = 1 %then %do ;
data master;
set new;
run;
%end;
%if &nbr > 1 %then %do ;
data master;
set master new ;
run;
%end ;
%mend;
/* concatenate datasets */
data runthis ;
set datasetNames ;
call execute('%catdat('||datasetname||','||_n_||')');
run;
Resolved: see Bob’s comments below.
Try using
PROC APPENDinstead of your “new” dataset; that will be much, much faster:PROC APPENDwill add each dataset into your new “master” without rebuilding it each time as you are doing now.This also avoids using
CALL EXECUTE, removing that memory issue you were running into (caused by generating so much code into the execution stack).