I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs.
1B68A
1BZ4B
4FUTA
I would like to 1) read the file line by line
2) download the atomic coordinates of each chain from the corresponding PDB files.
3) save the output to a folder.
I used the following script to extract chains. But this code prints only A chains from pdb files.
for i in 1B68 1BZ4 4FUT
do
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep ATOM $i.pdb | grep 'A' > $i\_A.pdb
done
The following BioPython code should suit your needs well.
It uses
PDB.Selectto only select the desired chains (in your case, one chain) andPDBIO()to create a structure containing just the chain.One final note: don’t write your own parser for PDB files. The format specification is ugly (really ugly), and the amount of faulty PDB files out there is staggering. Use a tool like BioPython that will handle parsing for you!
Furthermore, instead of using
wget, you should use tools that interact with the PDB database for you. They take FTP connection limitations into account, the changing nature of the PDB database, and more. I should know – I updatedBio.PDBListto account for changes in the database. =)