Suppose I do the following in SAS:
filename tmp pipe 'unzip -c -qq ./data_xml.zip';
libname tmp xml xmlmap=TMMap access=READONLY;
data header; set tmp.header; run;
data owners; set tmp.owners; run;
This will unzip the data_xml.zip file and use the SAS xmlmap file to generate two data sets, header and owners.
My question is, how many times will unzip run on data_xml.zip? Will the unzipping just happen once, or will it happen twice because I’m setting a data set from the tmp libname twice?
The short answer is, YES, it will unzip it twice.
As I understand it, the
unzip -cessentially turns that data into a sequential source because it is streaming from the unzip command directly into thePIPElibname.Presumably, you want to stream via the
-cand thePIPEbecause of disk space and/or performance concerns with landing the file to disk first. Unfortunately, I’m fairly certain that the way things are set up, the only way to minimize the CPU of an additional unzip will be to land it to disk on a temporary file first.However, depending on the size of the file, the CPU hit for a second unzip might not outweigh the I/O hit for having to read an expanded file from disk at least one extra time.