I have a set of strings. I would like to extract a regular expression

Question

0

Asked: June 1, 20262026-06-01T02:05:49+00:00 2026-06-01T02:05:49+00:00

I have a set of strings. I would like to extract a regular expression

0

I have a set of strings. I would like to extract a regular expression that matches all these strings. Further, it should match preferably only these and not many others.

Is there an existing python module that does this?

www.google.com
www.googlemail.com/hello/hey
www.google.com/hello/hey

Then, the extracted regex could be www\.google(mail)?\.com(/hello/hey)?
(This also matches www.googlemail.com but I guess I need to live with it)

My motivation for this is in a machine learning setting. I would like to extract a regular expression that “best” represents all these strings.

I understand that regexes like
(www.google.com)|(www.googlemail.com/hello/hey)|(www.google.com/hello/hey) or
www.google(mail.com/hello/hey)|(.com)|(/hello/hey) would be right given my specification, because they match no other urls other than the given ones. But such a regex will become very large if there are large number of strings in the set.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T02:05:50+00:00

Editorial Team

2026-06-01T02:05:50+00:00Added an answer on June 1, 2026 at 2:05 am

There’s a little perl library that was designed to do this. I know you’re using python, but if it’s a very large list of strings, you can fork off a perl subprocess now and then. (Or copy the algorithm if you’re sufficiently motivated).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a set of strings. I would like to extract a regular expression

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply