I’m trying to extract specific fields from my file. Basically, output fields only containing

Question

0

Asked: May 28, 20262026-05-28T08:21:08+00:00 2026-05-28T08:21:08+00:00

I’m trying to extract specific fields from my file. Basically, output fields only containing

0

I’m trying to extract specific fields from my file. Basically, output fields only containing a matched expression, with output starting after the matched records.

This is an example of my input. Sometimes the fields are in different orders as well as having a different number of lines before the header I’m trying to match.

I was having a hard time finding out how to achieve this with cut and sed commands and couldn’t quite find an awk method.

CGATS.17
FORMAT_VERSION  1
KEYWORD "SampleID"
KEYWORD "SAMPLE_NAME"
NUMBER_OF_FIELDS    45
WEIGHTING_FUNCTION "ILLUMINANT, D50"
WEIGHTING_FUNCTION "OBSERVER, 2 degree"
BEGIN_DATA_FORMAT
SampleID    SAMPLE_NAME CMYK_C  CMYK_M  CMYK_Y  CMYK_K  LAB_L   LAB_A   LAB_B   nm380   nm390   nm400
END_DATA_FORMAT
NUMBER_OF_SETS  182
BEGIN_DATA
1   1   40  40  40  0   62.5    6.98    4.09    0.195213    0.205916    0.212827
2   2   0   40  40  0   73.69   25.48   24.89   0.200109    0.211081    0.218222
3   3   40  40  0   0   63.95   12.14   -20.91  0.346069    0.365042    0.377148
4   4   0   70  70  0   58.91   47.69   35.54   0.080033    0.084421    0.087317
END_DATA

This is the dirty code I used which mostly did the job but without the field header conditional search. The awk command is just to remove empty lines surrounding the output.

cut -f 7-9 -s input.txt | 
sed -E 's/(LAB_.)//g' |
awk 'NF' > file.txt

The output I would expect appears like this. It’s still tab-delimited containing only the values of the fields starting directly under (LAB_.)

62.5    6.98    4.09
73.69   25.48   24.89
63.95   12.14   -20.91
58.91   47.69   35.54

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T08:21:09+00:00

Script:

#!/usr/bin/awk -f

# We look for line starting with BEGIN_DATA_FORMAT do the getline function and 
# store location of fields that have "LAB" in their name on the next line.

/^BEGIN_DATA_FORMAT/{
        getline
            for (i=1;i<=NF;i++) 
                    if ($i~/LAB/) a[i]=$i
                } 

# In this regex range we look for lines that have more than 2 fields. For those 
# lines we loop thru each field and see if the location matches to the ones 
# captured in our earlier array (i.e location number of fields that have "LAB" 
# in their name). If we find a match we print those fields. 

/^BEGIN_DATA$/,/^END_DATA$/{
             s="";
             if (NF<2) next; else 
                for (j in a)
            s=s?s"\t"$j:$j
            print s; 
                 }

Test:

[jaypal:~/Temp] ./script.awk file
62.5    6.98    4.09    
73.69   25.48   24.89   
63.95   12.14   -20.91  
58.91   47.69   35.54

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to extract specific fields from my file. Basically, output fields only containing

Leave an answerCancel reply

1 Answer

Script:

Test:

Leave an answer
Cancel reply