I would like to enable Biopython to read PQR files (modified PDB files with occupancy and B factor replaced by atom charge and radius).
The Biopython PDB parser fails to read the Bfactor because it retrieves the value by PDB column indices (which the PQR format does not honor).
Example of a standard PDB atom record:
ATOM 1 N LEU 1 3.469 24.678 1.940 1.00 48.46 N
1.00 is occupancy and 48.46 is bfactor
And the PQR :
ATOM 1 N LEU 1 3.469 24.678 1.940 0.1010 1.8240
0.1010 is charge and 1.8240 is radius
So, how can I avoid "PDBConstructionException: Invalid or missing B factor" and properly parse the charge/radius values?
As the PQR format is no longer standard PDB format, you’d need to modify the source of the Biopython PDB parser to fit your needs. Thankfully, Biopython is open source, and
PDB.PDBParseris quite readable/easy to modify.Extracting data
From the PQR description you gave:
Biopython’s PDB Parser expects values strictly on column widths. (It’s perfectly valid for PDB files to have no white space between values.) I’d think your best bet would be to modify how line data is extracted in
PDB.PDBParser, but maintain most of its other error-checking andStructure-creation. As the fields will be whitespace-delimited, you can simply useline.split()to create a list of parameters, which you then give meaningful names.Once you parse the data from a given line, you’ll probably want to store it as fields in an Atom object). Atoms are added to the structure with the
structure_builder. Perhaps you could modifyinit_atom()to add charge and radius as fields to thePDB.Atomobject.Where to start
Here’s the approximate location in the source code you’d want to modify.
Outline
So, start to finish, here’s what I’d do:
StructureBuildermethodinit_pqr_atom()(modelled afterinit_atom()) which creates a new Atom object, addingchargeandradiusas fields in a newAtom. (Perhaps you’d want to create aPDB.PQRAtomobject that inheritsPDB.Atom?).Create an optional parameter in the
init()method ofPDBParserthat tells the parser it’s a PQR file (not a standard PDB):is_pqrto_parse(), which passes it to_parse_coordinates._parse_coordinates, parse data as normal if not a PQR file (i.e. use the default PDB column specifications). If it is PQR, parse the data based on the whitespace-delimited format (again, Python’sstr.split()will return a list of whitespace-delimited items from a string).AtomorPQRAtomobject in the structure, passing in the parsed values.