Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7767719
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T15:44:45+00:00 2026-06-01T15:44:45+00:00

I have been tweaking a regular expression over several days to try to capture,

  • 0

I have been tweaking a regular expression over several days to try to capture, with a single definition, several cases of inconsistent format in the address field of a database.

I am new to Python and regular expressions, and have gotten great feedback here is stackoverflow, and with my new knowledge, I built a RegEx that is getting close to the final result, but still can’t spot the problem.

import re

r1 = r"([\w\s+]+),?\s*\(?([\w\s+\\/]+)\)?\s*\(?([\w\s+\\/]+)\)?"

match1 = re.match(r1, 'caracas, venezuela')
match2 = re.match(r1, 'caracas (venezuela)')
match3 = re.match(r1, 'caracas, (venezuela) (df)')

group1 = match1.groups()
group2 = match2.groups()
group3 = match3.groups()

print group1
print group2
print group3

This thing should return ‘caracas, venezuela’ for groups 1 and 2, and ‘caracas, venezuela, df’ for group 3, instead, it returns:

('caracas', 'venezuel' 'a') 
('caracas ', 'venezuel' 'a')
('caracas', 'venezuela', 'df')

The only perfect match is group 3. The other 2 are isolating the ‘a’ at the end, and the 2nd one has an extra space at the end of ‘caracas ‘.
Thanks in advance for any insight.

Cheers!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T15:44:47+00:00Added an answer on June 1, 2026 at 3:44 pm

    Regular expressions might be overkill… what exactly is your problem statement? What do you need to capture?

    Some things I caught (in order of appearance in your regex; sometimes it helps to read it out, left-to-right, English-style):

    ([\w\s+]+)
    

    This says, “capture one or more (letter or one or more spaces)”

    Do you really want to capture the spaces at the end of the city name? Also, you don’t need (indeed, shouldn’t have) the 1-or-more symbol + inside your brackets [ ], since your regex will already be matching one or more of them based on the outer +. I’d rewrite this part like this:

    ([\w\s]*\w)
    

    Which will match eagerly up to the last alphanumeric character (“zero or more (letter or space) followed by a letter”). This does assume you have at least one character, but is better than your assumption that a single space would work as well.

    Next you have:

    ,?\s*\(?
    

    which looks okay to me except that it doesn’t guarantee that you’ll see either a comma or an open paren anymore. What about:

    (?:,\s*\(|,\s*|\s*\()
    

    which says, “non-capturingly match either (a comma with maybe some spaces and then an open paren) OR (a comma with maybe some spaces) OR (maybe some spaces and then an open paren)”. This enforces that you must have either a comma or a paren or both.

    Next you have the capturing expression, very similar to the first:

    ([\w\s+\\/]+)
    

    Again, you don’t want the spaces (or slashes in this case) at the end of the city name, and you don’t want the + inside the [ ]:

    ([\w\s\\/]*\w)
    

    The next expression is probably where you’re getting your venezuel a problem; let’s take a look:

    \)?\s*\(?([\w\s+\\/]+)\)?
    

    This is a rather long one, so let’s break it down:

    \)?\s*\(?
    

    says to “maybe match a close paren, and then maybe some spaces, and then maybe an open paren”. This is okay I guess, let’s move on to the real problem:

    ([\w\s+\\/]+)
    

    This capturing group MUST match at least one character. If the matcher sees “venezuela” at the end of your address, it will eagerly match the characters venezuel and then need to satisfy this final expression with what it has left, a. Try instead:

    \)?\s*
    

    Followed by making your entire final expression optional, and the outer expression non-capturing:

    (?:\(?([\w\s+\\/]+)\)?)?
    

    The final expression would be:

    ([\w\s]*\w)(?:,\s*\(|,\s*|\s*\()([\w\s\\/]*\w)\)?\s*(?:\(?([\w\s+\\/]+)\)?)?
    

    Edit: fixed a problem that made the final group capture twice, once with the parens, once without. Now it should only capture the text inside the parens.

    Testing it on your examples:

    >>> re.match(r, 'caracas, venezuela').groups()
    ('caracas', 'venezuela', None)
    >>> re.match(r, 'caracas (venezuela)').groups()
    ('caracas', 'venezuela', None)
    >>> re.match(r, 'caracas, (venezuela) (df)').groups()
    ('caracas', 'venezuela', 'df')
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been tweaking Ruby code on sonar 2.13.1 for a couple of days.
Have been learning ASP.NET (using C#) over the past few days. I have made
Have been getting pretty bald over this situation! I am using MS VS 2010
I have been tweaking my program all day and I am having a problem
I have been tweaking with below sample code. The documentation for MathJax isn't very
I have been developing and tweaking things as much as I can and know
I feel pretty dumb at the moment, but for several days now, I have
I have been currently tweaking my Vim _gvimrc file to add a few features
Have been searching over the Graph API docs, looking for the answer to this.
have been sitting with this problem for a few days and really can't seem

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.