Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4054116
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T14:32:01+00:00 2026-05-20T14:32:01+00:00

I am trying to capture sub-strings from a string that looks similar to ‘some

  • 0

I am trying to capture sub-strings from a string that looks similar to

'some string, another string, '

I want the result match group to be

('some string', 'another string')

my current solution

>>> from re import match
>>> match(2 * '(.*?), ', 'some string, another string, ').groups()
('some string', 'another string')

works, but is not practicable – what I am showing here of course is massively reduced in terms of complexity compared to what I’m doing in the real project; I want to use one ‘straight’ (non-computed) regex pattern only. Unfortunately, my attempts have failed so far:

This doesn’t match (None as result), because {2} is applied to the space only, not to the whole string:

>>> match('.*?, {2}', 'some string, another string, ')

adding parentheses around the repeated string has the comma and space in the result

>>> match('(.*?, ){2}', 'some string, another string, ').groups()
('another string, ',)

adding another set of parantheses does fix that, but gets me too much:

>>> match('((.*?), ){2}', 'some string, another string, ').groups()
('another string, ', 'another string')

adding a non-capturing modifier improves the result, but still misses the first string

>>> match('(?:(.*?), ){2}', 'some string, another string, ').groups()
('another string',)

I feel like I’m close, but I can’t really seem to find the proper way.

Can anyone help me ? Any other approaches I’m not seeing ?


Update after the first few responses:

First up, thank you very much everyone, your help is greatly appreciated! 🙂

As I said in the original post, I have omitted a lot of complexity in my question for the sake of depicting the actual core problem. For starters, in the project I am working on, I am parsing large amounts of files (currently tens of thousands per day) in a number (currently 5, soon ~25, possibly in the hundreds later) of different line-based formats. There is also XML, JSON, binary and some other data file formats, but let’s stay focussed.

In order to cope with the multitude of file formats and to exploit the fact that many of them are line-based, I have created a somewhat generic Python module that loads one file after the other, applies a regex to every line and returns a large data structure with the matches. This module is a prototype, the production version will require a C++ version for performance reason which will be connected over Boost::Python and will probably add the subject of regex dialects to the list of complexities.

Also, there are not 2 repetitions, but an amount varying between currently zero and 70 (or so), the comma is not always a comma and despite what I said originally, some parts of the regex pattern will have to be computed at runtime; let’s just say I have reason to try and reduce the ‘dynamic’ amount and have as much ‘fixed’ pattern as possible.

So, in a word: I must use regular expressions.


Attempt to rephrase: I think the core of the problem boils down to: Is there a Python RegEx notation that e.g. involves curly braces repetitions and allows me to capture

'some string, another string, '

into

('some string', 'another string')

?

Hmmm, that probably narrows it down too far – but then, any way you do it is wrong 😀


Second attempt to rephrase: Why do I not see the first string (‘some string’) in the result ? Why does the regex produce a match (indicating there’s gotta be 2 of something), but only returns 1 string (the second one) ?

The problem remains the same even if I use non-numeric repetition, i.e. using + instead of {2}:

>>> match('(?:(.*?), )+', 'some string, another string, ').groups()
('another string',)

Also, it’s not the second string that’s returned, it is the last one:

>>> match('(?:(.*?), )+', 'some string, another string, third string, ').groups()
('third string',)

Again, thanks for your help, never ceases to amaze me how helpful peer review is while trying to find out what I actually want to know…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T14:32:02+00:00Added an answer on May 20, 2026 at 2:32 pm

    In order to sum this up, it seems I am already using the best solution by constructing the regex pattern in a ‘dynamic’ manner:

    >>> from re import match
    >>> match(2 * '(.*?), ', 'some string, another string, ').groups()
    ('some string', 'another string')
    

    the

    2 * '(.*?)
    

    is what I mean by dynamic. The alternative approach

    >>> match('(?:(.*?), ){2}', 'some string, another string, ').groups()
    ('another string',)
    

    fails to return the desired result due to the fact that (as Glenn and Alan kindly explained)

    with match, the captured content gets overwritten
    with each repetition of the capturing
    group

    Thanks for your help everyone! 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.