I need to process a big data file that contains multi-line records, example input:
1 Name Dan
1 Title Professor
1 Address aaa street
1 City xxx city
1 State yyy
1 Phone 123-456-7890
2 Name Luke
2 Title Professor
2 Address bbb street
2 City xxx city
3 Name Tom
3 Title Associate Professor
3 Like Golf
4 Name
4 Title Trainer
4 Likes Running
Note that the first integer field is unique and really identifies a whole record. So in the above input I really have 4 records although I dont know how many lines of attributes each records may have. I need to:
– identify valid record (must have “Name” and “Title” field)
– output the available attributes for each valid record, say “Name”, “Title”, “Address” are needed fields.
Example output:
1 Name Dan
1 Title Professor
1 Address aaa street
2 Name Luke
2 Title Professor
2 Address bbb street
3 Name Tom
3 Title Associate Professor
So in the output file, record 4 is removed since it doen’t have the “Name” field. Record 3 doesn’t have Address field but still being print to the output since it is a valid record that has “Name” and “Title”.
Can I do this with awk? But how do i identify a whole record using the first “id” field on each line?
Thanks a lot to the unix shell script expert for helping me out! 🙂
This seems to work. There are MANY ways you could do this, even in awk.
I’ve spaced it out for easier reading.
Note that record 3 doesn’t show up because it’s missing an “Address” field, which you identified as required.