So I have a rather messy text file I’m trying to convert to a sas data set. It looks something like this (though much bigger):
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
I’m trying to set 5 separate variables (ID, name, job, time, hours) but clearly only 3 of the variables appear after the first line. I tried this:
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover;
input ID $ name $ job $ time hours;
and didn’t get the right output, then I tried to parse it
infile “C:\Users\Desktop\jobs.txt” dlm = ‘ ‘ dsd missover; input
allData $; id = substr(allData, find(allData,”305″)-2, 7);
but I’m still not getting the right output. Any ideas?
EDIT: I’m trying now to use .scan() and .substr() to apart the larger data set, how do I subset a single line from the data?
Your data might not be all that messy; it just might be in a hierarchical format where the first row contains all five variables and subsequent rows contain values for variables 3-5. In other words, ID and NAME should be retained as you read through the file.
If that is correct (it’s a hierarchical layout) this here is a possible solution:
The key thing is to really understand how your raw file is organized. Once you know the rules, using SAS to read it is a snap!