We’ve got this little regexp in a module to parse URLs like the following:

Question

0

Asked: May 22, 20262026-05-22T14:56:29+00:00 2026-05-22T14:56:29+00:00

We’ve got this little regexp in a module to parse URLs like the following:

0

We’ve got this little regexp in a module to parse URLs like the following:

if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
    $url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {

mysql://anonymous@my.self.com:1234/dbname

and now we want to add parsing of sqlite URLs which can be like this:

sqlite:///dbname_which_is_a_file

But it won’t work with absolute paths like: sqlite:///tmp/dbname_which_is_a_file

What is the proper way of doing this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T14:56:29+00:00

The problem with the regular expression is that does not work with paths longer than two elements. It splits them into db_name and table_name (if any). Also this regular expression does not work with SQLite special filenames like ‘:memory’ (that are very useful for tests).

In order to have a maintainable RE approach, the best way to work with this is to have a dispatch table with the main protocols that need different parsing and have a subrutine for each different approach. Also will help have a RE with //x, so it can have comments and help its maintainability:

 sub test_re{
     my $url =shift;
     my $x={};
     @$x{qw(conn driver user pass host port dbname table_name tparam_name tparam_value conn_param_string)} =
         $url =~ m{
                ^(
                  (\w*)
                  ://
                  (?:
                    (\w+) # user
                    (?:
                      \:
                      ([^/\@]*) # password 
                    )?
                    \@
                  )? # could not have user,pass
                  (?:
                    ([\w\-\.]+) #host
                    (?:
                      \:
                      (\d+) # port
                    )? # port optional
                  )? # host and port optional
                  / # become in a third '/' if no user pass host and port
                  (\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
                )
                (?:
                  /  # if tables
                  (\w+) # get table
                  (?:
                    \? # parameters
                    (\w+)
                    =
                   (\w+)
                  )? # parameter is conditional but would have always a tablename
                )? # conditinal table and parameter
                (
                  (?:
                    ;
                    (\w+)
                    =
                    (\w+)
                  )* # rest of parameters if any
                )
                $
             }x;
     return $x;
 }

But I will recommend to use URI::Split (less code verbosity than URI), and then split the path as needed.

You can see the difference of using the RE vs URI::Split here:

#!/usr/bin/env perl

use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use Data::Dumper;

my $urls = [qw(
             mysql://anonymous@my.self.com:1234/dbname
             mysql://anonymous@my.self.com:1234/dbname/tablename
             mysql://anonymous@my.self.com:1234/dbname/pathextra/tablename
             sqlite:///dbname_which_is_a_file
             sqlite:///tmp/dbname_which_is_a_file
             sqlite:///tmp/db/dbname_which_is_a_file
             sqlite:///:dbname_which_is_a_file
             sqlite:///:memory
             )];



foreach my $url (@$urls) {
    print Dumper(test_re($url));
    print Dumper(uri_split($url));
}

Results:

 [...]
 == testing sqlite:///dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'dbname_which_is_a_file',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///dbname_which_is_a_file',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => undef,
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

 == testing sqlite:///tmp/dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'tmp',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///tmp',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => 'dbname_which_is_a_file',
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/tmp/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

== testing sqlite:///tmp/db/dbname_which_is_a_file ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/tmp/db/dbname_which_is_a_file';
$VAR4 = undef;
$VAR5 = undef;

== testing sqlite:///:memory ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/:memory';
$VAR4 = undef;
$VAR5 = undef;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We’ve got this little regexp in a module to parse URLs like the following:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply