I need to process a big data file that contains multi-line records, example input:

Question

0

Asked: June 4, 20262026-06-04T18:37:50+00:00 2026-06-04T18:37:50+00:00

I need to process a big data file that contains multi-line records, example input:

0

I need to process a big data file that contains multi-line records, example input:

1  Name      Dan
1  Title     Professor
1  Address   aaa street
1  City      xxx city
1  State     yyy
1  Phone     123-456-7890
2  Name      Luke
2  Title     Professor
2  Address   bbb street
2  City      xxx city
3  Name      Tom
3  Title     Associate Professor
3  Like      Golf
4  Name
4  Title     Trainer
4  Likes     Running

Note that the first integer field is unique and really identifies a whole record. So in the above input I really have 4 records although I dont know how many lines of attributes each records may have. I need to:
– identify valid record (must have “Name” and “Title” field)
– output the available attributes for each valid record, say “Name”, “Title”, “Address” are needed fields.

Example output:

1  Name      Dan
1  Title     Professor
1  Address   aaa street
2  Name      Luke
2  Title     Professor
2  Address   bbb street
3  Name      Tom
3  Title     Associate Professor

So in the output file, record 4 is removed since it doen’t have the “Name” field. Record 3 doesn’t have Address field but still being print to the output since it is a valid record that has “Name” and “Title”.

Can I do this with awk? But how do i identify a whole record using the first “id” field on each line?

Thanks a lot to the unix shell script expert for helping me out! 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T18:37:52+00:00

This seems to work. There are MANY ways you could do this, even in awk.

I’ve spaced it out for easier reading.

Note that record 3 doesn’t show up because it’s missing an “Address” field, which you identified as required.

#!/usr/bin/awk -f

BEGIN {
        # Set your required fields here...
        required["Name"]=1;
        required["Title"]=1;
        required["Address"]=1;

        # Count the required fields
        for (i in required) enough++;
}

# Note that this will run on the first record, but only to initialize variables
$1 != last1 {
        if (hits >= enough) {
                printf("%s",output);
        }
        last1=$1; output=""; hits=0;
}

# This appends the current line to a buffer, followed by the record separator (RS)
{ output=output $0 RS }

# Count the required fields; used to determine whether to print the buffer
required[$2] { hits++ }

END {
        # Print the final buffer, since we only print on the next record
        if (hits >= enough) {
                printf("%s",output);
        }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to process a big data file that contains multi-line records, example input:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply