I should explain as background to this question that I don’t know any Perl,

Question

0

Asked: May 21, 20262026-05-21T19:57:42+00:00 2026-05-21T19:57:42+00:00

I should explain as background to this question that I don’t know any Perl,

0

I should explain as background to this question that I don’t know any Perl, and have a violent allergy to regular expressions
(we all have our weaknesses). I’m trying to figure out why a Perl program won’t accept the data I’m feeding it. I don’t need to understand this program in any depth – I’m just doing a timing comparison.

Consider this assignment statement:

($sample_ls_id) = $sample_ls_id =~ /:\w\w(\d+):/;

If I understand this correctly, it is checking if sample_ls_id matches some regex, and if so, assigning the entire string, or something like that.

However, I don’t understand how this works.
According to the documentation, namely perldoc perlretut, which I looked at briefly

$sample_ls_id =~ /:\w\w(\d+):/

just returns true or false if there is a match.

The strings I’m trying to match look like

1000    10      0       0       1        urn:lsid:dcc.hapmap.org:Individual:CEPH1000.10:1        urn:lsid:dcc.hapmap.org:Sample:SAMPLE1:1

This fails with the error

Use of uninitialized value $sample_ls_id in concatenation (.) or string
at database/populate/family.pl line 38, <INPUT> line 1.

Line 38 is

print OUTPUT "$sample_ls_id\t$family_ped_id\t$individual_ped_id\t$father_ped_id\t$mother_ped_id\t$sex\t$created_by\t$population_code\n";

See the complete script below. However, the apparently very similar string

1420    9       0       0       1       urn:lsid:dcc.hapmap.org:Individual:CEPH1420.09:1  urn:lsid:dcc.hapmap.org:Sample:NA12003:1

seems to pass.

For context, the entire piece of code is:

use strict;
use warnings;
use Getopt::Long;

my $input_file = "data/family_ceu.txt";
my $output_file = "sql/family_ceu.sql";
my $population_code = "CEU";

GetOptions ('i=s' => \$input_file,
            'o=s' => \$output_file,
            'p=s' => \$population_code
            );

usagecheck();

my $created_by = 'gwas_analyzer';

print "Creating SQL file for inserting family data from $input_file\n";

open (INPUT, "< $input_file");
open (OUTPUT, "> $output_file");

print OUTPUT "INSERT INTO population (population_code, private) VALUES ('$population_code', 'f');\n";
print OUTPUT "COPY family (ls_id, family_ped_id, individual_ped_id, father_ped_id, mother_ped_id, sex, created_by, population_code) FROM stdin;                      
";

while (my $line = <INPUT>)
{
    chomp $line;

    #Skip any comment lines 
    next if($line =~ /^#/);

    my ($family_ped_id, $individual_ped_id, $father_ped_id, $mother_ped_id, $sex, $individual_ls_id, $sample_ls_id) = split (/\t/, $line);

    ($sample_ls_id) = $sample_ls_id =~ /:\w\w(\d+):/;

    print OUTPUT "$sample_ls_id\t$family_ped_id\t$individual_ped_id\t$father_ped_id\t$mother_ped_id\t$sex\t$created_by\t$population_code\n";
}

print OUTPUT "\\.\n";
close OUTPUT;

sub usagecheck
{
    if (!$input_file || !$output_file || !$population_code)
    {
        print "Missing argument (see required arguments below):\n";
        usage();
        exit;
    }
}

sub usage
{
    print "perl family.pl -i <input file> -o <output file> -p <population code>\n";
}

I’m sure this is a very simple question if you know regexes and Perl.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T19:57:43+00:00

When $sample_ls_id = 'urn:lsid:dcc.hapmap.org:Sample:SAMPLE1:1';

The regular expression ‘/:\w\w(\d+):/;’ fails. This regular expression would pass when the string has a colon ‘:’ followed by a “word” character ‘\w’,
another “word” character ‘\w’ followed by one or more digits ‘\d+’ and a colon ‘:’.

When $sample_ls_id = 'urn:lsid:dcc.hapmap.org:Sample:NA12003:1';

The regular expression ‘/:\w\w(\d+):/;’ finds its match in
‘:NA12003:’. ( colon, 2 word characters, digits and a colon ).

my $sample_id = 'urn:lsid:dcc.hapmap.org:Sample:NA12003:1'
($sample_ls_id) = $sample_ls_id =~ /:\w\w(\d+):/;

‘( $sample_ls_id )’ captures the ‘(\d+)’ portion of the match ( also stored in $1 ), which in this case would be 12003.

You were getting an error with the earlier example, because the regular expression fails and leaves ‘($sample_ls_id)’ undefined.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I should explain as background to this question that I don’t know any Perl,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply