to Perl Masters in the world! I have a file like this to parse

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T09:01:00+00:00 2026-06-16T09:01:00+00:00

to Perl Masters in the world! I have a file like this to parse

0

to Perl Masters in the world!

I have a file like this to parse and want to make……

starting from the first column, ID, exon information, start position, end position and direction. ID increases by 1 when it meets a number.

1   9239    712 8571    +
1   start_codon 712 714 +
1   stop_codon  8569    8571    +
2   3882    24137   24264   +
2   start_codon 24137   24139   +
3   3882    24322   24391   +
4   3882    24490   26064   +
4   stop_codon  26062   26064   +
5   4972    26704   26740   +
5   start_codon 26704   26706   +
6   4972    26814   27170   +
7   4972    27257   27978   +
7   stop_codon  27976   27978   +
8   10048   40161   41114   -
8   start_codon 41112   41114   -
8   stop_codon  40161   40163   -
9   272 43167   43629   -
9   stop_codon  43167   43169   -
10  272 43755   44059   -
10  start_codon 44057   44059   -

like this ….

1   9239    *712*   *8571*  +
1   start_codon 712 714 +
1   stop_codon  8569    8571    +
*X  9239    712 8571    +*
2   3882    *24137* 24264   +
2   start_codon 24137   24139   +
3   3882    24322   24391   +
4   3882    24490   *26064* +
4   stop_codon  26062   26064   +
*X  3882    24173   26064   +*
5   4972    *26704* 26740   +
5   start_codon 26704   26706   +
6   4972    26814   27170   +
7   4972    27257   *27978* +
7   stop_codon  27976   27978   +
*X  4972    26704   27978 +*
8   10048   *40161* *41114* -
8   start_codon 41112   41114   -
8   stop_codon  40161   40163   -
*X  10048   40161   41114   -*
9   272 *43167* 43629   -
9   stop_codon  43167   43169   -
10  272 43755   *44059* -
10  start_codon 44057   44059   -
*X  272 43167   44059   -*

each line begins with X has to be added but with my skill I cannot… 🙁

The thing is for every exon number in the second column ignoring the “start_codon” and “end_codon”, have to get the minimum numbered exon position and maximum numbered exon position between asterisks *.

This is my basic code to parse the data… but I guess, have to re-code from the scratch
(I do not have any idea how to insert the line ‘X’)

(Sorry I deleted the code as its not so good enough and may give a confusion…)

Perl Masters in the World, Could you please help me???

Thank you!!

AS TLP aked I put my code back. Its embarrassing code though

use strict;

if (@ARGV != 1) {
    print "Invalid arguments\n";
    print "Usage: perl min_max.pl [exon_output_file]\n";
    exit(0);
}

my $FILENAME = $ARGV[0];
    my  $exonid = 0;
    my  $exon = "";
    my  $startpos = 0;
    my  $endpos = 0;
    my  $strand = "";
    my  $min_pos = 0;
    my  $max_pos = 0;

open (DATA, $FILENAME);

while (my $line = <DATA>) {
    chomp $line;

    if ($line ne "") {
        if ($line =~ /^(.+)\t(.+)\t(.+)\t(.+)\t(.+)/) {
        $exonid = $1;
        $exon = $2;
        $startpos = $3;
        $endpos = $4;
        $strand = $5;
        }
        if ($exon =~ /\d+/) {
            print $exonid,"\t",$exon,"\t",$startpos,"\t",$endpos,"\t",$strand,"\n";
        } else {
            print $exonid,"\t",$exon,"\t",$startpos,"\t",$endpos,"\t",$strand,"\n";
        }
    }
}

close (DATA);
exit;

How can I compare the biggest value and the lowest value….

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T09:01:01+00:00

Basically what you do is go through the lines, skip the ones you don’t want (i.e. no number in col 2), remember min/max for each new line in the same set, and when the col 2 number changes you print and start over. With this solution, you also have to print the last set manually at the end.

This code uses the internal DATA file handle for demonstration data. Simply change <DATA> to <> to use on a target input file like so: perl script.pl inputfile

use strict;
use warnings;
use List::Util qw(min max);

my $print;
my ($min, $max, $id);
while (<DATA>) {                   ###### change to <> to run on input file
    my @line = split;
    if ($line[1] !~ /^\d+$/) {                # if non-numbers in col 2
         print;                               # print line
         next;                                # skip to next line
    }
    if (!defined($id) or $id != $line[1]) {   # New dataset!
        say $print if $print;                 # Print and reset 
        $id = $line[1];
        $min = $max = undef;
    }
    $min = min($min // (), @line[2,3]);       # find min/max, skip undef
    $max = max($max // (), @line[2,3]);
    $print = join "\t", "X", $line[1], $min, $max;  # buffer the print
}
print $print;

__DATA__
1   9239    712 8571    +
1   start_codon 712 714 +
1   stop_codon  8569    8571    +
2   3882    24137   24264   +
2   start_codon 24137   24139   +
3   3882    24322   24391   +
4   3882    24490   26064   +
4   stop_codon  26062   26064   +
5   4972    26704   26740   +
5   start_codon 26704   26706   +
6   4972    26814   27170   +
7   4972    27257   27978   +
7   stop_codon  27976   27978   +
8   10048   40161   41114   -
8   start_codon 41112   41114   -
8   stop_codon  40161   40163   -
9   272 43167   43629   -
9   stop_codon  43167   43169   -
10  272 43755   44059   -
10  start_codon 44057   44059   -

Output:

9239    712     8571
3882    24137   26064
4972    26704   27978
10048   40161   41114
272     43167   44059

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

to Perl Masters in the world! I have a file like this to parse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply