You should not convert the bytes into characters, that is…

Question

0

Asked: May 11, 20262026-05-11T23:38:04+00:00 2026-05-11T23:38:04+00:00

Sample Data: 603 Some garbage data not related to me, 55, 113 -> 1-ENST0000

0

Sample Data:

603       Some garbage data not related to me, 55, 113 ->

1-ENST0000        This is sample data blh blah blah blahhhh
2-ENSBTAP0        This is also some other sample data
21-ENADT)$        DO NOT WANT TO READ THIS LINE. 
3-ENSGALP0        This is third sample data
node #4           This is 4th sample data
node #5           This is 5th sample data

This is also part of the input file but i dont wish to read this. 
Branch -> 05 13, 
      44, 1,1,4,1

17, 1150

637                   YYYYYY: 2 : %

EDIT: In the above data. The column width is fixed for the sections but there might be some sections I do not wish to read. above sample data has been edited to reflect that.

So in this input file I want to read contents of first section ‘1-ENST0000’ into an array and contents of ‘2-ENSBTAP0’ into a separate array and so on.

I am having trouble coming up with a regex that will define the pattern …first three lines have <someNumber>-ENS<someotherstuf> and then there can also be node #<some number here>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T23:38:04+00:00

OK, based on your later comment, this is a little different than the previous question. Also, I now realize that node #54 is a valid entry in the first column.

Update: I now also realize you do not need the first column.

Update: In general, you neither want to nor need to deal with character arrays in Perl.

Update: Now that you clarified the what should and should not be skipped, here is a version that deals with that. Add patterns to taste in the if condition.

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( <DATA> ) {
    chomp;

    if ( /^[0-9]+-ENS.{5} +(.+)$/
            or /^node #[0-9]+ +(.+)$/
    ) {
        push @data, [ split //, $1 ];
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
603       Some garbage data not related to me, 55, 113 ->

1-ENST0000        This is sample data blh blah blah blahhhh
2-ENSBTAP0        This is also some other sample data
21-ENADT)$        DO NOT WANT TO READ THIS LINE. 
3-ENSGALP0        This is third sample data
node #4           This is 4th sample data
node #5           This is 5th sample data

This is also part of the input file but i dont wish to read this. 
Branch -> 05 13, 
      44, 1,1,4,1

17, 1150

637                   YYYYYY: 2 : %

As for learning how to fish, I recommend you read everything related in perldoc perltoc.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions