Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6318925
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T15:42:53+00:00 2026-05-24T15:42:53+00:00

I have the following string: string = Will Ferrell (Nick Halsey), Rebecca Hall (Samantha),

  • 0

I have the following string:

string = "Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Michael Pena (Frank Garcia)"

I would like to create a list of tuples in the form of [(actor_name, character_name),...] like so:

[(Will Ferrell, Nick Halsey), (Rebecca Hall, Samantha), (Michael Pena, Frank Garcia)]

I am currently using a hack-ish way to do this, by splitting by the ( mark and then using .rstrip(‘(‘), like so:

for item in string.split(','):
    item.rstrip(')').split('(')

Is there a better, more robust way to do this? Thank you.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T15:42:53+00:00Added an answer on May 24, 2026 at 3:42 pm
    string = "Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Michael Pena (Frank Garcia)"
    
    import re
    pat = re.compile(r'([^(]+)\s*\(([^)]+)\)\s*(?:,\s*|$)')
    
    lst = [(t[0].strip(), t[1].strip()) for t in pat.findall(string)]
    

    The compiled pattern is a bit tricky. It’s a raw string, to make the backslashes less insane. What it means is: start a match group; match anything that isn’t a ‘(‘ character, any number of times as long as it is at least once; close the match group; match a literal ‘(‘ character; start another match group; match anything that isn’t a ‘)’ character, any number of times as long as it is at least once; close the match group; match a literal ‘)’ character; then match any white space (including none); then something really tricky. The really tricky part is a grouping that doesn’t form a match group. Instead of starting with ‘(‘ and ending with ‘)’ it starts with “(?:” and then again ends with ‘)’. I used this grouping so I could put a vertical bar in to allow two alternate patterns: either a comma matches followed by any amount of white space, or else the end of the line was reached (the ‘$’ character).

    Then I used pat.findall() to find all the places within string that the pattern matches; it automatically returns tuples. I put that in a list comprehension and called .strip() on each item to clean off white space.

    Of course, we can just make the regular expression even more complicated and have it return names that already have white space stripped off. The regular expression gets really hairy, though, so we will use one of the coolest features in Python regular expressions: “verbose” mode, where you can sprawl a pattern across many lines and put comments as you like. We are using a raw triple-quote string so the backslashes are convenient and the multiple lines are convenient. Here you go:

    import re
    s_pat = r'''
    \s*  # any amount of white space
    ([^( \t]  # start match group; match one char that is not a '(' or space or tab
    [^(]*  # match any number of non '(' characters
    [^( \t])  # match one char that is not a '(' or space or tab; close match group
    \s*  # any amount of white space
    \(  # match an actual required '(' char (not in any match group)
    \s*  # any amount of white space
    ([^) \t]  # start match group; match one char that is not a ')' or space or tab
    [^)]*  # match any number of non ')' characters
    [^) \t])  # match one char that is not a ')' or space or tab; close match group
    \s*  # any amount of white space
    \) # match an actual required ')' char (not in any match group)
    \s*  # any amount of white space
    (?:,|$)  # non-match group: either a comma or the end of a line
    '''
    pat = re.compile(s_pat, re.VERBOSE)
    
    lst = pat.findall(string)
    

    Man, that really wasn’t worth the effort.

    Also, the above preserves the white space inside the names. You could easily normalize the white space, to make sure it is 100% consistent, by splitting on white space and rejoining with spaces.

    string = '  Will   Ferrell  ( Nick\tHalsey ) , Rebecca Hall (Samantha), Michael\fPena (Frank Garcia)'
    
    import re
    pat = re.compile(r'([^(]+)\s*\(([^)]+)\)\s*(?:,\s*|$)')
    
    def nws(s):
        """normalize white space.  Replaces all runs of white space by a single space."""
        return " ".join(w for w in s.split())
    
    lst = [tuple(nws(item) for item in t) for t in pat.findall(string)]
    
    print lst # prints: [('Will Ferrell', 'Nick Halsey'), ('Rebecca Hall', 'Samantha'), ('Michael Pena', 'Frank Garcia')]
    

    Now the string has silly white space: multiple spaces, a tab, and even a form feed (“\f”) in it. The above cleans it up so that names are separated by a single space.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following string which will probably contain ~100 entries: String foo =
I have the following connection string, and you will notice Provider's.Tests, notice the single
I have the following String First Last <first.last@email.com> I would like to extract first.last
I have the following string and I would like to remove <bpt *>*</bpt> and
I have the following code that takes a String of milliseconds (will be from
I have the following string an-ca an-ca If you will look it closely you
I have following string: Test, User < test@test.com >, Another, Test < another@test.com >,
I have following situation: String a = A Web crawler is a computer program
I have the following string in a variable. Stack Overflow is as frictionless and
I have the following string: index 0 1 2 3 4 5 6 7

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.