I have files with the format: ATOM 3736 CB THR A 486 -6.552 153.891

Question

0

Asked: May 25, 20262026-05-25T01:29:14+00:00 2026-05-25T01:29:14+00:00

I have files with the format: ATOM 3736 CB THR A 486 -6.552 153.891

0

I have files with the format:

ATOM   3736  CB  THR A 486      -6.552 153.891  -7.922  1.00115.15           C  
ATOM   3737  OG1 THR A 486      -6.756 154.842  -6.866  1.00114.94           O  
ATOM   3738  CG2 THR A 486      -7.867 153.727  -8.636  1.00115.11           C  
ATOM   3739  OXT THR A 486      -4.978 151.257  -9.140  1.00115.13           O  
HETATM10351  C1  NAG B 203      33.671  87.279  39.456  0.50 90.22           C  
HETATM10483  C1  NAG Z 702      28.025 104.269 -27.569  0.50 92.75           C    
ATOM   3736  CB  THR X 486      -6.552  86.240   7.922  1.00115.15           C  
ATOM   3737  OG1 THR X 486      -6.756  85.289   6.866  1.00114.94           O  
ATOM   3738  CG2 THR X 486      -7.867  86.404   8.636  1.00115.11           C  
ATOM   3739  OXT THR X 486      -4.978  88.874   9.140  1.00115.13           O  
HETATM10351  C1  NAG Y 203      33.671 152.852 -39.456  0.50 90.22           C  
HETATM10639  C2  FUC C 402     -48.168 162.221 -22.404  0.50103.03           C

For each block of lines starting with HETATM*, I would like to change column 5 to match that of the previous ATOM block. It means that for the first HETATM* block both B and Z will change to A, whereas for the second HETATM* block both Y and C will change to X.

A second question, I do not really need to do it, it is just out of curiosity, how would I split the file after each line starting with HETATM* but only if the next line is ATOM?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T01:29:14+00:00

Here is my solution, which solves the first problem (replacing the fifth field) while preserving white spaces:

$1=="ATOM" {
    fifthField=$5

    # Block to determine which index position field #5 is
    fifthField_index = 1
    for (i = 0; i < 4; i++) {
        // Skip until white space
        for (; substr($0, fifthField_index, 1) != " "; fifthField_index++) { }
        // Skip white spaces
        for (; substr($0, fifthField_index, 1) == " "; fifthField_index++) { }
    }

    print;next
}

/^HETATM/ {
    before_fifthField = substr($0, 1, fifthField_index - 1)
    after_fifthField = substr($0, fifthField_index + 1, length($0))
    print before_fifthField fifthField after_fifthField
    next
}

1

It is not the most elegant solution, but it works. This solution assumes that the fifth field is a single character.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have files with the format: ATOM 3736 CB THR A 486 -6.552 153.891

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply