We’ve got this little regexp in a module to parse URLs like the following:
if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
$url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {
mysql://anonymous@my.self.com:1234/dbname
and now we want to add parsing of sqlite URLs which can be like this:
sqlite:///dbname_which_is_a_file
But it won’t work with absolute paths like: sqlite:///tmp/dbname_which_is_a_file
What is the proper way of doing this?
The problem with the regular expression is that does not work with paths longer than two elements. It splits them into db_name and table_name (if any). Also this regular expression does not work with SQLite special filenames like ‘:memory’ (that are very useful for tests).
In order to have a maintainable RE approach, the best way to work with this is to have a dispatch table with the main protocols that need different parsing and have a subrutine for each different approach. Also will help have a RE with //x, so it can have comments and help its maintainability:
But I will recommend to use URI::Split (less code verbosity than URI), and then split the path as needed.
You can see the difference of using the RE vs URI::Split here:
Results: