Looking for help on writing a Perl program that takes an input file and

Question

0

Asked: May 18, 20262026-05-18T11:29:47+00:00 2026-05-18T11:29:47+00:00

Looking for help on writing a Perl program that takes an input file and

0

Looking for help on writing a Perl program that takes an input file and performs manipulations based on follow-up commands. I’m a beginning Perl student so please don’t get too advance in suggestions. The structure that I have so far is a main program and 4 subs.

I’m having trouble with two parts:

Writing the portion of the main segment that creates a unique record for each line from the input file (which is fixed width format). I think this should be done with substr but I don’t know much more of how this should be structured. Unpack is beyond the scope of my learning so far.

One of the functions called in the main program is a “distance” sub which will calculate distance between atoms. I’m thinking this should be a For Loop inside a For loop. Any thoughts on what approach I should take?

The records should store an array of atom records (one record/atom per newline):

• The atom’s serial number, 5 digits. (cols 7 – 11)

• The three-letter name of the amino acid to which it belongs (cols 18 – 20)

• The atom’s three coordinates real number as decimal & Orthogonal Coordinates (x,y,z) (cols 31 – 54 )
For X in Angstroms cols. 31-38
For Y in Angstroms cols. 39-46
For Z in Angstroms cols. 47-54

• The atom’s one- or two-letter element name (e.g. C, O, N, Na) (cols 77-78 )

sub Distance
# take an array of atom records and return the max distance
# between all pairs of atoms in that array. (cols 31-54)

Here is sample text from an input file.

# truncating for testing purposes. Actual data is aprox. 100 columns     
# and starts with ATOM or HETATM    
__DATA__   
ATOM   4743  CG  GLN A 704      19.896  32.017  54.717  1.00 66.44           C    
ATOM   4744  CD  GLN A 704      19.589  30.757  55.525  1.00 73.28           C    
ATOM   4745  OE1 GLN A 704      18.801  29.892  55.098  1.00 75.91           O

Here is what I have so far for the main and sub for make records. I hate to be lame but I don’t have anything to show for the Distance sub yet so don’t worry about giving code, any suggestions on how to approach would be very appreciated.

use warnings;
use strict; 

my @fields;
my @recs;

while ( <DATA> ) {
chomp;
@fields = split(/\s+/);
push @recs, makeRecord(@fields);
}

for (my $i = 0; $i < @recs; $i++) {
printRec( $recs[$i] );
}
my %command_table = (
  freq => \&freq,
  length => \&length,
  density => \&density,
  help => \&help, 
  quit => \&quit
);

print "Enter a command: ";
  while ( <STDIN> ) {
  chomp; 
  my @line = split( /\s+/);
  my $command = shift @line;
  if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
    print "Command must be: freq, length, density or quit\n";
  }
    else {
    $command_table{$command}->();
  }
print "Enter a command: ";
}

sub makeRecord 
# Read the entire line and make records from the lines that contain the 
# word ATOM or HETATM in the first column. Not sure how to do this:
{
 my %record = 
 (
 serialnumber => shift,
 aminoacid => shift,
 coordinates => shift,
 element  => [ @_ ]
 );
 return\%record;
 }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T11:29:47+00:00

There’s Perl code available online for working with PDB files (which obviously you are doing). I’m not suggesting just using a module you downloaded and be done with it, as surely your instructor wouldn’t approve, and you wouldn’t learn that much 😉 But you could take a look at some of the code that’s offered and try to see whether some bits there address your problem.

I did a quick bit of googling, I saw that there’s ParsePDB.pm (for example). You can find the web page here. I didn’t have a look at the code or the functionality though, I’m just hoping there will be something in there that you may find helpful.

EDIT 1

Okay, it’s 14 hours later now, and I felt like doing some coding, so as you have not yet accepted an answer I thought I could just ignore my own advice and draw up something (as you will notice I have copied Zaid’s data structure)…

#!/usr/bin/perl

use warnings;
use strict;

sub makeRecord {
   my ($ser_num, $aa, $x, $y, $z, $element) = @_;
   # copying Zaid now as her/his structure looks very sensible!
   my $record = {
                  serial  => $ser_num,
                  aa      => $aa,
                  element => $element,
                  xyz     => [$x, $y, $z],
                };
   return $record;
}


my $file = shift @ARGV;
my @records; # will be an array of hash references

open FILE, "<$file" or die "$!";
while (<FILE>) {
   if (/^ATOM|^HETATM/) { # only get the structure data lines
      chomp; # not necessary here, but good practice I'd say

      my @fields = split; # by default 'split' splits on whitespace

      # now use an array slice to only pass the array elements
      # you're interested in (using the positional indices from @fields):
      push @records, makeRecord(@fields[1,3,6,7,8,11]);
   }
}
close FILE;

EDIT 2

Concerning the distance subroutine: the for loop inside the for loop should do the job, but this is the brute force way which might take quite a while (as you’d have to do (number_of_atoms)^2 calculations), depending on the size of your input molecule. For the purpose of your assignment the brute force approach is probably acceptable; in other cases you’d have to decide whether to favour ease of coding, or computational speed. If your instructor also wants you to keep the latter in mind, you could take a look at this page (I know you actually want the largest distance, and you’re in 3D, not 2D…)

Ok, now I just hope that you managed to find some helpful bits and pieces in here 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Looking for help on writing a Perl program that takes an input file and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply