Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7627181
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T05:25:31+00:00 2026-05-31T05:25:31+00:00

I’m a perl beginner, please help me out with my query… I’m trying to

  • 0

I’m a perl beginner, please help me out with my query… I’m trying to extract information from a blast table (a snippet of what it looks like is below):
It’s a standard blast table input… I basically want to extract any information on a list of reads (Look at my second script below , to get an idea of what I want to do)…. Anyhow this is precisely what I’ve done in the second script:

INPUTS:

1) the blast table:

38.1    0.53    59544   GH8NFLV01A02ED  GH8NFLV01A02ED rank=0113471 x=305.0 y=211.5 length=345  1   YP_003242370    Dynamin family protein [Paenibacillus sp. Y412MC10] -1  0   48.936170212766 40.4255319148936    47  345 1213    13.6231884057971    3.87469084913438    31  171 544 590
34.3    7.5 123828  GH8NFLV01A03QJ  GH8NFLV01A03QJ rank=0239249 x=305.0 y=1945.5 length=452 1   XP_002639994    Hypothetical protein CBG10824 [Caenorhabditis briggsae] 3   0   52.1739130434783    32.6086956521739    46  452 367 10.1769911504425    12.5340599455041    111 248 79  124
37.7    0.70    62716   GH8NFLV01A09B8  GH8NFLV01A09B8 rank=0119267 x=307.0 y=1014.0 length=512 1   XP_002756773    PREDICTED: probable G-protein coupled receptor 123-like, partial [Callithrix jacchus]   1   0   73.5294117647059    52.9411764705882    34  512 703 6.640625    4.83641536273115    43  144 273 306
37.7    0.98    33114   GH8NFLV01A0H5C  GH8NFLV01A0H5C rank=0066011 x=298.0 y=2638.5 length=573 1   XP_002756773    PREDICTED: probable G-protein coupled receptor 123-like, partial [Callithrix jacchus]   -3  0   73.5294117647059    52.9411764705882    34  573 703 5.93368237347295    4.83641536273115    131 232 273 306
103 1e-020  65742   GH8NFLV01A0MXI  GH8NFLV01A0MXI rank=0124865 x=300.5 y=644.0 length=475  1   ABZ08973    hypothetical protein ALOHA_HF4000APKG6B14ctg1g18 [uncultured marine crenarchaeote HF4000_APKG6B14]  2   0   77.9411764705882    77.9411764705882    68  475 151 14.3157894736842    45.0331125827815    2   205 1   68
41.6    0.053   36083   GH8NFLV01A0QKX  GH8NFLV01A0QKX rank=0071366 x=301.0 y=1279.0 length=526 1   XP_766153   hypothetical protein [Theileria parva strain Muguga]    -1  0   66.6666666666667    56.6666666666667    30  526 304 5.70342205323194    9.86842105263158    392 481 31  60
45.4    0.003   78246   GH8NFLV01A0Z29  GH8NFLV01A0Z29 rank=0148293 x=304.0 y=1315.0 length=432 1   ZP_04111769 hypothetical protein bthur0007_56280 [Bacillus thuringiensis serovar monterrey BGSC 4AJ1]   3   0   51.8518518518518    38.8888888888889    54  432 193 12.5    27.979274611399 48  209 97  150
71.6    4e-011  97250   GH8NFLV01A14MR  GH8NFLV01A14MR rank=0184885 x=317.5 y=609.5 length=314  1   ZP_03823721 DNA replication protein [Acinetobacter sp. ATCC 27244]  1   0   92.5    92.5    40  314 311 12.7388535031847    12.8617363344051    193 312 13  52
58.2    5e-007  154555  GH8NFLV01A1KCH  GH8NFLV01A1KCH rank=0309994 x=310.0 y=2991.0 length=267 1   ZP_03823721 DNA replication protein [Acinetobacter sp. ATCC 27244]  1   0   82.051282051282 82.051282051282 39  267 311 14.6067415730337    12.540192926045 142 258 1   39

2) The reads list:

GH8NFLV01A09B8
GH8NFLV01A02ED
etc
etc

3) the output I want:

37.7    0.70    62716   GH8NFLV01A09B8  GH8NFLV01A09B8 rank=0119267 x=307.0 y=1014.0 length=512 1   XP_002756773    PREDICTED: probable G-protein coupled receptor 123-like, partial [Callithrix jacchus]   1   0   73.5294117647059    52.9411764705882    34  512 703 6.640625    4.83641536273115    43  144 273 306
38.1    0.53    59544   GH8NFLV01A02ED  GH8NFLV01A02ED rank=0113471 x=305.0 y=211.5 length=345  1   YP_003242370    Dynamin family protein [Paenibacillus sp. Y412MC10] -1  0   48.936170212766 40.4255319148936    47  345 1213    13.6231884057971    3.87469084913438    31  171 544 590

I want a subset of the information in the first list, given a list of read names I want to extract (that is found in the 4th column)
Instead of hashing the reads list (only?) I want to hash the blast table itself, and use the information in Column 4 (of the blast table)as the keys to extract the values of each key, even when that key may have more than one value(i.e: each read name might actually have more than one hit , or associated blast result in the table), keeping in mind, that the value includes the WHOLE row with that key(readname) in it.

My greplist.pl script does this, but is very very slow, I think , ( and correct me if i’m wrong) that by loading the whole table in a hash, that this should speed things up tremendously …

Thank you for your help.

My scripts:
The Broken one (mambo5.pl)

#!/usr/bin/perl -w
# purpose:  extract blastX data from a list of readnames
use strict;
open (DATA,$ARGV[0]) or die ("Usage: ./mambo5.pl BlastXTable readslist");
open (LIST,$ARGV[1]) or die ("Usage: ./mambo5.pl BlastXTable readslist");
my %hash = <DATA>;
close (DATA);
my $filename=$ARGV[0];
open(OUT, "> $filename.bololom");

my $readName;

while ( <LIST> )
{
    #########;
    if(/^(.*?)$/)#
    {
        $readName=$1;#
        chomp $readName;
        if (exists $hash{$readName})
        {
            print "bingo!";
            my $output =$hash{$readName};
            print OUT "$output\n";
        }
        else 
        {
            print "it aint workin\n";
            #print %hash;
        }           
    }
}
close (LIST);

The Slow and quick cheat (that works) and is very slow (my blast tables can be about 400MB to 2GB large, I’m sure you can see why it’s so slow)

#!/usr/bin/perl -w
## 
# This script finds a list of names in a blast table and outputs the result in a new file
# name must exist and list must be correctly formatted
# will not output anything using a "normal" blast file, must be a table blast
# if you have the standard blast output use blast2table script

use strict;
my $filein=$ARGV[0] or die ("usage: ./listgrep.pl readslist blast_table\n");
my $db=$ARGV[1] or die ("usage: ./listgrep.pl readslist blast_table\n");
#open the reads you want to grep
my $read;
my $line;
open(READSLIST,$filein);
while($line=<READSLIST>)
{
    if ($line=~/^(.*)$/) 
    {
        $read = $1;
        print "$read\n";
        system("grep \"$read\" $db >$read\_.out\n");
    }


    #system("grep $read $db >$read\_.out\n");
}
system("cat *\_.out >$filein\_greps.txt\n");
system("rm *.out\n");

I don’t know how to define that 4th column as the key : maybe I could use the split function, but I’ve tried to find a way that does this for a table of more than 2 columns to no avail… Please help!
If there is an easy way out of this please let me know

Thanks !

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T05:25:32+00:00Added an answer on May 31, 2026 at 5:25 am

    Voila, 2 ways of doing this, one with nothing to do with perl :

    awk 'BEGIN {while ( i = getline < "reads_list") ar[$i] = $1;} {if ($4 in ar) print $0;}' blast_table > new_blast_table
    

    Mambo6.pl

    #!/usr/bin/perl -w
    # purpose:  extract blastX data from a list of readnames. HINT: Make sure your list file only has unique names , that way you save time. 
    use strict;
    open (DATA,$ARGV[0]) or die ("Usage: ./mambo5.pl BlastXTable readslist");
    open (LIST,$ARGV[1]) or die ("Usage: ./mambo5.pl BlastXTable readslist");
    my %hash;
    my $val;
    my $key;
    while (<DATA>)
    {
        #chomp;
        if(/((.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?))$/)
        {
            #print "$1\n";
            $key= $5;#read
            $val= $1;#whole row; notice the brackets around the whole match.
            $hash{$key} .= exists $hash{$key} ? "$val\n" : $val;
        }
        else {
            print "something wrong with format";
        }
    }
    close (DATA);
    open(OUT, "> $ARGV[1]\_out\.txt");
    
    my $readName;
    
    while ( <LIST> )
    {
        #########;
        if(/^(.*?)$/)#
        {
            $readName=$1;#
            chomp $readName;
            if (exists $hash{$readName})
            {
                print "$readName\n";
                my $output =$hash{$readName};
                print OUT "$output";
            }
            else 
            {
                #print "it aint workin\n";
            }           
        }
    }
    close (LIST);
    close (OUT);
    

    The oneliner is faster, and probably better than my script, I’m sure some people can find easier ways to do it… I just thought I’d put this up since it does what I want.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
link Im having trouble converting the html entites into html characters, (&# 8217;) i
For some reason, after submitting a string like this Jack’s Spindle from a text
I am trying to understand how to use SyndicationItem to display feed which is
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I have a text area in my form which accepts all possible characters from
Does anyone know how can I replace this 2 symbol below from the string

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.