I have written the following script to search for a motif(substring) in a protein

Question

0

Asked: May 11, 20262026-05-11T16:30:55+00:00 2026-05-11T16:30:55+00:00

I have written the following script to search for a motif(substring) in a protein

0

I have written the following script to search for a motif(substring) in a protein sequences(strings). I am beginner and writing this has been tough for me. I have two questions regarding the same:
1. Errors: The following script has few errors. I have been at it for quite sometime now but have not figured out what and why?
2. The following script has been written to search for one motif(substring) in protein sequences(strings). My next task involves searching for multiple motifs in a specific order (ex: motif1 motif2 motif 3 motif4 this order cannot be changed) in the same protein sequences(strings)

        use strict;
        use warnings;

        my @file_data=();
        my $motif ='';
        my $protein_seq='';
        my $h= '[VLIM]';   
        my $s= '[AG]';
        my $x= '[ARNDCEQGHILKMFPSTWYV]';
        my $regexp = "($h){4}D($x){4}D"; #motif to be searched is hhhhDxxxxD
        my @locations=();

        @file_data= get_file_data("seq.txt");

        $protein_seq= extract_sequence(@file_data); 

    #searching for a motif hhhhDxxxxD in each protein sequence in the give file

        foreach my $line(@file_data){
        if ($motif=~ /$regexp/){
        print "found motif \n\n";
        }
        else {
        print "not found \n\n";
        }
        }
#recording the location/position of motif to be outputed

        @locations= match_position($regexp,$seq);
        if (@locations){ 
        print "Searching for motifs $regexp \n";
        print "Catalytic site is at location:\n";
        }
        else{
        print "motif not found \n\n";
        }
        exit;

        sub get_file_data{
        my ($filename)=@_;
        use strict;
        use warnings;
        my $sequence='';

        foreach my $line(@file_data){

        if ($line=~ /^\s*$/){
        next;
                }
        elsif ($line=~ /^\s*#/){
        next;
        }
        elsif ($line=~ /^>/){
        next;
        }
        else {
        $sequence.=$line;
        }
        }
        $sequence=~ s/\s//g;
        return $sequence;
        }

        sub(match_positions) {
        my ($regexp, $sequence)=@_;
        use strict;
        my @position=();
        while ($sequence=~ /$regexp/ig){
        push (@position, $-[0]);
        }
        return @position;
        }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T16:30:55+00:00

First of all, the keyword is elsif, second of all you don’t need it. You can compress the code in the get_file_data loop to:
```
next if $line =~ /^\s*$|^>/; 
$sequence .= $line;
```
As long as you’re going to use regular expressions — unless too unwieldy — you might as well search for all the cases that you want to ignore. If you find that actual second case, you can add it as an another alternation. Say you wanted to exclude lines that begin with #-. Then you would just add it in like so: /^\s*$|^>|^#-/
Another thing is that my position=(); needs to have the @ sigil, before position, or otherwise, perl thinks you’re trying to something tricky with a call to position().
You need the following changes:
```
 my $h= '[VLIM]';   
 my $s= '[AG]';
 my $x= '[ARNDCEQGHILKMFPSTWYV]';
```
Otherwise, you’re just assigning to $h to an array reference with a single slot populated by whatever would be returned from the sub VLIM.

Third, don’t use $&. Replace pos($sequence)-length($&)+1

push @positions, $-[0];

or better yet, use English:

use English qw<-no_match_vars>;
...
push @positions, $LAST_MATCH_START[0];

I would suggest the following for the file reading:

use IO::File;
...
# Use real file handles
my $fh = IO::File->new( "<seq.txt" );
get_file_data( $fh ); # They can be passed
...
sub get_file_data{
    my $file_handle = shift; 
    ...
    # while loop conserves resources
    while ( my $line = <$file_handle> ) { 
        next if $line =~ /^\s*$|^>/;
        $sequence .= $line;
    }

A suggestion for going forward — it helps me immensely:

A. Install Smart::Comments

B. Put this at the top of your script:
```
 use Smart::Comments;
```
C. Every time you’re not sure what you’ve got so far, like if you wanted to see the current contents of $sequence, place the following in the code:
```
### $sequence
exit 0;
```
just show it and exit. When you get too many printouts, delete them.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have written the following script to search for a motif(substring) in a protein

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply