I have PDB(text) files which are in a directory. I would like to print

Question

0

Editorial Team

Asked: June 11, 20262026-06-11T11:54:43+00:00 2026-06-11T11:54:43+00:00

I have PDB(text) files which are in a directory. I would like to print

0

I have PDB(text) files which are in a directory. I would like to print the number of subunits from each PDB file.

Read all lines in a pdb file that start with ATOM
The fifth column of the ATOM line contains A, B, C, D etc.
If it contains only A the number of subunit is 1. If it contains A and B, the number of subunits are 2. If it contains A, B, and C the number of subunits are 3.

1kg2.pdb file

ATOM   1363  N   ASN A 258      82.149 -23.468   9.733  1.00 57.80           N  
ATOM   1364  CA  ASN A 258      82.494 -22.084   9.356  1.00 62.98           C  
ATOM   1395  C   MET B 196      34.816 -51.911  11.750  1.00 49.79           C  
ATOM   1396  O   MET B 196      35.611 -52.439  10.963  1.00 47.65           O

1uz3.pdb file

ATOM   1384  O   ARG A 260      80.505 -20.450  15.420  1.00 22.10           O 
ATOM   1385  CB  ARG A 260      78.980 -18.077  15.207  1.00 36.88           C 
ATOM   1399  SD  MET B 196      34.003 -52.544  16.664  1.00 57.16           S 
ATOM   1401  N   ASP C 197      34.781 -50.611  12.007  1.00 44.30           N

2b69.pdb file

ATOM   1393  N   MET B 196      33.300 -54.017  12.033  1.00 46.46           N  
ATOM   1394  CA  MET B 196      33.782 -52.714  12.566  1.00 49.99           C

desired output

pdb_id   subunits

 1kg2      2
 1uz3      3
 2b69      1

How can I do this with awk, python or Biopython?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T11:54:44+00:00

You can use an array to record all seen values for the fifth column.

$ gawk '/^ATOM/ {seen[$5] = 1} END {print length(seen)}' 1kg2.pdb
2

Edit: Using gawk 4.x you can use ENDFILE to generate the required output:

BEGIN {
  print "pdb_id\t\tsubunits"
  print
}

/^ATOM/ {
  seen[$5] = 1
}

ENDFILE {
  print FILENAME, "\t", length(seen)
  delete seen
}

The result:

$ gawk -f pdb.awk 1kg2.pdb 1uz3.pdb 2b69.pdb
pdb_id          subunits

1kg2.pdb         2
1uz3.pdb         3
2b69.pdb         1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have PDB(text) files which are in a directory. I would like to print

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply