I have a Python regex that takes a string (database connection URI) and splits it using named groups into username, password etc.
uri = 'username:password@host/database'
m = re.compile('^(?P<user>[^:@]+)(\:(?P<password>[^@]*))?@(?P<host>[^\:@/]+)(\:(?P<port>[0-9]+))?/(?P<db>[^\?]+)?$').match(uri)
print m.groupdict()
{'host': 'host', 'password': 'password', 'db': 'database', 'user': 'username', 'port': None}
This works fine. The problem is if the uri has a @ symbol in it, since that’s used to split password and host. For example,
uri = 'username:p@ssword@host/database'
will not match, which is expected. However, I’d like to be able to escape the special character, eg:
uri = 'username:p\@ssword@host/database'
and have it match. My regex experience is pretty limited – I guess what I’d like to do is modify the
(?P<password>[^@]*)
group so that it will match any character that’s not a @, unless it’s preceded by a \ character. Of course, some (most) connection strings will not contain a \@ at all.
Any help much appreciated.
My take is you want greedy matching, that is password is up the last @ and hostname is between last @ and first /
A simple way could be like this:
You might want to add optionals, that is (stuff)+ if e.g. username and password can be omitted.